EC2にApptainerをインストールしてTrinityを実行してみた

AWSならリソースをオンデマンドに利用できるのでTrinityコンテナをサクッと実行できます!
2023.05.30

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

こんにちは!コンサル部のinomaso(@inomasosan)です。

今回はEC2にインストールしたApptainer上でTrinityのコンテナを実行できるかを試してみました。

Apptainerとは?

Apptainerの概要やEC2へのインストール方法は、以下ブログをご参照ください。

Trinityとは?

Trinityはゲノムアセンブリのソフトウェアの一つで、リファレンスゲノムを作成することができます。
次世代シーケンサー(NGS)から得られたリードをつなぎ合わせて元のゲノム配列に復元します。

ゲノムアセンブリの詳細を知りたい方は以下の記事がおすすめです。

やってみた

今回は以下のGitHubを参考に、単一のEC2にApptainerをインストールして、Trinityコンテナを起動してみました。

EC2の構築とApptainerをインストールできていることが前提となります。

検証環境

今回構築した環境は以下の通りです。

項目 バージョン
OS Ubuntu Server 22.04 LTS
AMI ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230516
インスタンスタイプ c6a.large
ストレージ 30 GiB

アセンブル用のサンプルファイルをダウンロード

SSMセッションマネージャーで接続した場合、デフォルトの作業ディレクトリは/var/snap/amazon-ssm-agent/xxxx配下となります。
そのため、ユーザのホームディレクリに移動してからサンプルファイルをダウンロードします。

$ sudo su - ssm-user

サンプルファイルはDocker用のテストデータがGitHub上にあったので、そちらをダウンロードしてきます。

$ wget https://github.com/trinityrnaseq/trinityrnaseq/raw/master/Docker/test_data/reads.left.fq.gz
$ wget https://github.com/trinityrnaseq/trinityrnaseq/raw/master/Docker/test_data/reads.right.fq.gz

正常にダウンロードできたかを念の為確認します。

$ ls -la

Apptainer上でTrinityコンテナを実行

GitHubのSingularityのコマンドを参考に実行してみました。
ApptainerはDocker(OCI)イメージと互換性があり、起動時にOCI形式からSIF形式への変換を自動で実施してくれるため、今回の検証ではDocker Hubのイメージを使用しています。

$ apptainer run docker://trinityrnaseq/trinityrnaseq:latest Trinity \
          --seqType fq \
          --left `pwd`/reads.left.fq.gz \
          --right `pwd`/reads.right.fq.gz \
          --NO_SEQTK \
          --max_memory 1G --CPU 4 \
          --output `pwd`/trinity_out_dir

また、Apptainerは、コンテナの実行時にデフォルトでホストサーバの/home/$USER/tmp$PWDをコンテナにマウントしてくれます。 Dockerのように明示的にボリュームマウントしなくても、ホストサーバのファイルを利用可能です。

Trinityが完了するとカレントディレクトリにtrinity_out_dir.Trinity.fastaが出力されます、

$ ls -la
total 2660
drwxr-x--- 5 ssm-user ssm-user    4096 May 30 06:08 .
drwxr-xr-x 4 root     root        4096 May 30 05:54 ..
drwx------ 3 ssm-user ssm-user    4096 May 30 05:54 .apptainer
-rw-r--r-- 1 ssm-user ssm-user     220 Jan  6  2022 .bash_logout
-rw-r--r-- 1 ssm-user ssm-user    3771 Jan  6  2022 .bashrc
drwx------ 3 ssm-user ssm-user    4096 May 30 05:55 .local
-rw-r--r-- 1 ssm-user ssm-user     807 Jan  6  2022 .profile
-rw-rw-r-- 1 ssm-user ssm-user     215 May 30 05:54 .wget-hsts
-rw-rw-r-- 1 ssm-user ssm-user 1251148 May 30 05:54 reads.left.fq.gz
-rw-rw-r-- 1 ssm-user ssm-user 1272939 May 30 05:54 reads.right.fq.gz
drwxrwxr-x 8 ssm-user ssm-user    4096 May 30 06:08 trinity_out_dir
-rw-rw-r-- 1 ssm-user ssm-user  151717 May 30 06:08 trinity_out_dir.Trinity.fasta
-rw-rw-r-- 1 ssm-user ssm-user    2950 May 30 06:08 trinity_out_dir.Trinity.fasta.gene_trans_map

実行結果のログが気になる方は、以下のセクションを展開してみてください。

Trinity実行結果
     ______  ____   ____  ____   ____  ______  __ __
    |      ||    \ |    ||    \ |    ||      ||  |  |
    |      ||  D  ) |  | |  _  | |  | |      ||  |  |
    |_|  |_||    /  |  | |  |  | |  | |_|  |_||  ~  |
      |  |  |    \  |  | |  |  | |  |   |  |  |___, |
      |  |  |  .  \ |  | |  |  | |  |   |  |  |     |
      |__|  |__|\_||____||__|__||____|  |__|  |____/

    Trinity-v2.15.1



Left read files: $VAR1 = [
          '/home/ssm-user/reads.left.fq.gz'
        ];
Right read files: $VAR1 = [
          '/home/ssm-user/reads.right.fq.gz'
        ];
Trinity version: Trinity-v2.15.1
-currently using the latest production release of Trinity.

Tuesday, May 30, 2023: 06:07:21 CMD: java -Xmx64m -XX:ParallelGCThreads=2  -jar /usr/local/bin/util/support_scripts/ExitTester.jar 0
Tuesday, May 30, 2023: 06:07:22 CMD: java -Xmx4g -XX:ParallelGCThreads=2  -jar /usr/local/bin/util/support_scripts/ExitTester.jar 1
Tuesday, May 30, 2023: 06:07:22 CMD: mkdir -p /home/ssm-user/trinity_out_dir
Tuesday, May 30, 2023: 06:07:22 CMD: mkdir -p /home/ssm-user/trinity_out_dir/chrysalis


----------------------------------------------------------------------------------
-------------- Trinity Phase 1: Clustering of RNA-Seq Reads  ---------------------
----------------------------------------------------------------------------------

---------------------------------------------------------------
------------ In silico Read Normalization ---------------------
-- (Removing Excess Reads Beyond 200 Coverage --
---------------------------------------------------------------

# running normalization on reads: $VAR1 = [
          [
            '/home/ssm-user/reads.left.fq.gz'
          ],
          [
            '/home/ssm-user/reads.right.fq.gz'
          ]
        ];


Tuesday, May 30, 2023: 06:07:22 CMD: /usr/local/bin/util/insilico_read_normalization.pl --seqType fq --JM 1G  --max_cov 200 --min_cov 1 --CPU 4 --output /home/ssm-user/trinity_out_dir/insilico_read_normalization --max_CV 10000  --NO_SEQTK  --left /home/ssm-user/reads.left.fq.gz --right /home/ssm-user/reads.right.fq.gz --pairs_together  --PARALLEL_STATS
-prepping seqs
Converting input files. (both directions in parallel)CMD: /usr/local/bin/util/..//util/support_scripts//fastQ_to_fastA.pl -I <(gunzip -c /home/ssm-user/reads.left.fq.gz)  >> left.fa 2> left.readcount
CMD: /usr/local/bin/util/..//util/support_scripts//fastQ_to_fastA.pl -I <(gunzip -c /home/ssm-user/reads.right.fq.gz)  >> right.fa 2> right.readcount
CMD finished (1 seconds)
CMD finished (1 seconds)
CMD: touch left.fa.ok
CMD finished (0 seconds)
CMD: touch right.fa.ok
CMD finished (0 seconds)
Done converting input files.CMD: cat left.fa right.fa > both.fa
CMD finished (0 seconds)
CMD: touch both.fa.ok
CMD finished (0 seconds)
-kmer counting.
-------------------------------------------
----------- Jellyfish  --------------------
-- (building a k-mer catalog from reads) --
-------------------------------------------

CMD: jellyfish count -t 4 -m 25 -s 100000000  --canonical  both.fa
CMD finished (1 seconds)
CMD: jellyfish histo -t 4 -o jellyfish.K25.min2.kmers.fa.histo mer_counts.jf
CMD finished (0 seconds)
CMD: jellyfish dump -L 2 mer_counts.jf > jellyfish.K25.min2.kmers.fa
CMD finished (0 seconds)
CMD: touch jellyfish.K25.min2.kmers.fa.success
CMD finished (0 seconds)
-generating stats files
CMD: /usr/local/bin/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads left.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25  --num_threads 2  --DS  > left.fa.K25.stats
CMD: /usr/local/bin/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads right.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25  --num_threads 2  --DS> right.fa.K25.stats
-reading Kmer occurrences...
-reading Kmer occurrences...

 done parsing 100964 Kmers, 100964 added, taking 0 seconds.

 done parsing 100964 Kmers, 100964 added, taking 0 seconds.
STATS_GENERATION_TIME: 1 seconds.
CMD finished (1 seconds)
STATS_GENERATION_TIME: 1 seconds.
CMD finished (1 seconds)
CMD: touch left.fa.K25.stats.ok
CMD finished (0 seconds)
CMD: touch right.fa.K25.stats.ok
CMD finished (0 seconds)
-sorting each stats file by read name.
CMD: head -n1 left.fa.K25.stats > left.fa.K25.stats.sort && tail -n +2 left.fa.K25.stats | /usr/bin/sort --parallel=4 -k1,1 -T . -S 1G >> left.fa.K25.stats.sort
CMD: head -n1 right.fa.K25.stats > right.fa.K25.stats.sort && tail -n +2 right.fa.K25.stats | /usr/bin/sort --parallel=4 -k1,1 -T . -S 1G >> right.fa.K25.stats.sort
CMD finished (0 seconds)
CMD finished (0 seconds)
CMD: touch left.fa.K25.stats.sort.ok
CMD finished (0 seconds)
CMD: touch right.fa.K25.stats.sort.ok
CMD finished (0 seconds)
-defining normalized reads
CMD: /usr/local/bin/util/..//util/support_scripts//nbkc_merge_left_right_stats.pl --left left.fa.K25.stats.sort --right right.fa.K25.stats.sort --sorted > pairs.K25.stats
-opening left.fa.K25.stats.sort
-opening right.fa.K25.stats.sort
-done opening files.
CMD finished (0 seconds)
CMD: touch pairs.K25.stats.ok
CMD finished (0 seconds)
CMD: /usr/local/bin/util/..//util/support_scripts//nbkc_normalize.pl --stats_file pairs.K25.stats --max_cov 200  --min_cov 1 --max_CV 10000 > pairs.K25.stats.C200.maxCV10000.accs
30472 / 30575 = 99.66% reads selected during normalization.
0 / 30575 = 0.00% reads discarded as likely aberrant based on coverage profiles.
0 / 30575 = 0.00% reads discarded as below minimum coverage threshold=1
CMD finished (0 seconds)
CMD: touch pairs.K25.stats.C200.maxCV10000.accs.ok
CMD finished (0 seconds)
-search and capture.
-preparing to extract selected reads from: /home/ssm-user/reads.right.fq.gz ... done prepping, now search and capture.
-capturing normalized reads from: /home/ssm-user/reads.right.fq.gz
-preparing to extract selected reads from: /home/ssm-user/reads.left.fq.gz ... done prepping, now search and capture.
-capturing normalized reads from: /home/ssm-user/reads.left.fq.gz
CMD: touch /home/ssm-user/trinity_out_dir/insilico_read_normalization/reads.left.fq.gz.normalized_K25_maxC200_minC1_maxCV10000.fq.ok
CMD finished (0 seconds)
CMD: touch /home/ssm-user/trinity_out_dir/insilico_read_normalization/reads.right.fq.gz.normalized_K25_maxC200_minC1_maxCV10000.fq.ok
CMD finished (0 seconds)
CMD: ln -sf /home/ssm-user/trinity_out_dir/insilico_read_normalization/reads.left.fq.gz.normalized_K25_maxC200_minC1_maxCV10000.fq left.norm.fq
CMD finished (0 seconds)
CMD: ln -sf /home/ssm-user/trinity_out_dir/insilico_read_normalization/reads.right.fq.gz.normalized_K25_maxC200_minC1_maxCV10000.fq right.norm.fq
CMD finished (0 seconds)
-removing tmp dir /home/ssm-user/trinity_out_dir/insilico_read_normalization/tmp_normalized_reads


Normalization complete. See outputs:
        /home/ssm-user/trinity_out_dir/insilico_read_normalization/reads.left.fq.gz.normalized_K25_maxC200_minC1_maxCV10000.fq
        /home/ssm-user/trinity_out_dir/insilico_read_normalization/reads.right.fq.gz.normalized_K25_maxC200_minC1_maxCV10000.fq
Tuesday, May 30, 2023: 06:07:26 CMD: touch /home/ssm-user/trinity_out_dir/insilico_read_normalization/normalization.ok
Converting input files. (in parallel)Tuesday, May 30, 2023: 06:07:26    CMD: /usr/local/bin/util/support_scripts/fastQ_to_fastA.pl -I /home/ssm-user/trinity_out_dir/insilico_read_normalization/left.norm.fq  >> left.fa 2> /home/ssm-user/trinity_out_dir/insilico_read_normalization/left.norm.fq.readcount
Tuesday, May 30, 2023: 06:07:26 CMD: /usr/local/bin/util/support_scripts/fastQ_to_fastA.pl -I /home/ssm-user/trinity_out_dir/insilico_read_normalization/right.norm.fq  >> right.fa 2> /home/ssm-user/trinity_out_dir/insilico_read_normalization/right.norm.fq.readcount
Tuesday, May 30, 2023: 06:07:27 CMD: touch right.fa.ok
Tuesday, May 30, 2023: 06:07:27 CMD: touch left.fa.ok
Tuesday, May 30, 2023: 06:07:27 CMD: touch left.fa.ok right.fa.ok
Tuesday, May 30, 2023: 06:07:27 CMD: cat left.fa right.fa > /home/ssm-user/trinity_out_dir/both.fa
Tuesday, May 30, 2023: 06:07:27 CMD: touch /home/ssm-user/trinity_out_dir/both.fa.ok
-------------------------------------------
----------- Jellyfish  --------------------
-- (building a k-mer (25) catalog from reads) --
-------------------------------------------

* [Tue May 30 06:07:27 2023] Running CMD: jellyfish count -t 4 -m 25 -s 100000000 -o mer_counts.25.asm.jf --canonical  /home/ssm-user/trinity_out_dir/both.fa
* [Tue May 30 06:07:28 2023] Running CMD: jellyfish dump -L 1 mer_counts.25.asm.jf > jellyfish.kmers.25.asm.fa
* [Tue May 30 06:07:28 2023] Running CMD: jellyfish histo -t 4 -o jellyfish.kmers.25.asm.fa.histo mer_counts.25.asm.jf
----------------------------------------------
--------------- Inchworm (K=25, asm) ---------------------
-- (Linear contig construction from k-mers) --
----------------------------------------------

* [Tue May 30 06:07:28 2023] Running CMD: /usr/local/bin/Inchworm/bin//inchworm --kmers jellyfish.kmers.25.asm.fa --run_inchworm -K 25 --monitor 1   --DS  --num_threads 4  --PARALLEL_IWORM   --min_any_entropy 1.0   -L 25  --no_prune_error_kmers  > /home/ssm-user/trinity_out_dir/inchworm.DS.fa.tmp
Kmer length set to: 25
Min assembly length set to: 25
Monitor turned on, set to: 1
double stranded mode set
min entropy set to: 1
setting number of threads to: 4
-setting parallel iworm mode.
-reading Kmer occurrences...
 [0M] Kmers parsed.
 done parsing 517949 Kmers, 517949 added, taking 0 seconds.

TIMING KMER_DB_BUILDING 0 s.
Pruning kmers (min_kmer_count=1 min_any_entropy=1 min_ratio_non_error=0.005)
Pruned 4114 kmers from catalog.
        Pruning time: 1 seconds = 0.0166667 minutes.

TIMING PRUNING 1 s.
-populating the kmer seed candidate list.
Kcounter hash size: 517949
Processed 513835 non-zero abundance kmers in kcounter.
-Not sorting list of kmers, given parallel mode in effect.
-beginning inchworm contig assembly.
Total kcounter hash size: 517949 vs. sorted list size: 513835
num threads set to: 4
Done opening file. tmp.iworm.fa.pid_3435.thread_0
Done opening file. tmp.iworm.fa.pid_3435.thread_1
Done opening file. tmp.iworm.fa.pid_3435.thread_2
Done opening file. tmp.iworm.fa.pid_3435.thread_3

        Iworm contig assembly time: 0 seconds = 0 minutes.

TIMING CONTIG_BUILDING 0 s.

TIMING PROG_RUNTIME 1 s.
* [Tue May 30 06:07:29 2023] Running CMD: mv /home/ssm-user/trinity_out_dir/inchworm.DS.fa.tmp /home/ssm-user/trinity_out_dir/inchworm.DS.fa
Tuesday, May 30, 2023: 06:07:29 CMD: touch /home/ssm-user/trinity_out_dir/inchworm.DS.fa.finished
--------------------------------------------------------
-------------------- Chrysalis -------------------------
-- (Contig Clustering & de Bruijn Graph Construction) --
--------------------------------------------------------

inchworm_target: /home/ssm-user/trinity_out_dir/both.fa
bowtie_reads_fa: /home/ssm-user/trinity_out_dir/both.fa
chrysalis_reads_fa: /home/ssm-user/trinity_out_dir/both.fa
* [Tue May 30 06:07:29 2023] Running CMD: /usr/local/bin/util/support_scripts/filter_iworm_by_min_length_or_cov.pl /home/ssm-user/trinity_out_dir/inchworm.DS.fa 100 10 > /home/ssm-user/trinity_out_dir/chrysalis/inchworm.DS.fa.min100
* [Tue May 30 06:07:29 2023] Running CMD: /usr/local/bin/bowtie2-build --threads 4 -o 3 /home/ssm-user/trinity_out_dir/chrysalis/inchworm.DS.fa.min100 /home/ssm-user/trinity_out_dir/chrysalis/inchworm.DS.fa.min100 1>/dev/null
* [Tue May 30 06:07:30 2023] Running CMD: bash -c " set -o pipefail;/usr/local/bin/bowtie2 --local -k 2 --no-unal --threads 4 -f --score-min G,20,8 -x /home/ssm-user/trinity_out_dir/chrysalis/inchworm.DS.fa.min100 /home/ssm-user/trinity_out_dir/both.fa  | samtools view -@ 4 -F4 -Sb - | samtools sort -m 134217728 -@ 4 -no /home/ssm-user/trinity_out_dir/chrysalis/iworm.bowtie.nameSorted.bam"
* [Tue May 30 06:07:32 2023] Running CMD: /usr/local/bin/util/support_scripts/scaffold_iworm_contigs.pl /home/ssm-user/trinity_out_dir/chrysalis/iworm.bowtie.nameSorted.bam /home/ssm-user/trinity_out_dir/chrysalis/inchworm.DS.fa.min100 > /home/ssm-user/trinity_out_dir/chrysalis/iworm_scaffolds.txt
* [Tue May 30 06:07:32 2023] Running CMD: /usr/local/bin/Chrysalis/bin/GraphFromFasta -i /home/ssm-user/trinity_out_dir/chrysalis/inchworm.DS.fa.min100 -r /home/ssm-user/trinity_out_dir/both.fa -min_contig_length 200 -min_glue 2 -glue_factor 0.05 -min_iso_ratio 0.05 -t 4 -k 24 -kk 48  -scaffolding /home/ssm-user/trinity_out_dir/chrysalis/iworm_scaffolds.txt  > /home/ssm-user/trinity_out_dir/chrysalis/iworm_cluster_welds_graph.txt
* [Tue May 30 06:07:34 2023] Running CMD: /usr/bin/sort --parallel=4 -T . -S 1G  -k9,9gr /home/ssm-user/trinity_out_dir/chrysalis/iworm_cluster_welds_graph.txt > /home/ssm-user/trinity_out_dir/chrysalis/iworm_cluster_welds_graph.txt.sorted
* [Tue May 30 06:07:34 2023] Running CMD: /usr/local/bin/util/support_scripts/annotate_chrysalis_welds_with_iworm_names.pl /home/ssm-user/trinity_out_dir/chrysalis/inchworm.DS.fa.min100 /home/ssm-user/trinity_out_dir/chrysalis/iworm_cluster_welds_graph.txt.sorted > /home/ssm-user/trinity_out_dir/chrysalis/iworm_cluster_welds_graph.txt.sorted.wIwormNames
* [Tue May 30 06:07:34 2023] Running CMD: /usr/local/bin/Chrysalis/bin/BubbleUpClustering -i /home/ssm-user/trinity_out_dir/chrysalis/inchworm.DS.fa.min100  -weld_graph /home/ssm-user/trinity_out_dir/chrysalis/iworm_cluster_welds_graph.txt.sorted -min_contig_length 200 -max_cluster_size 25  > /home/ssm-user/trinity_out_dir/chrysalis/GraphFromIwormFasta.out
* [Tue May 30 06:07:34 2023] Running CMD: /usr/local/bin/Chrysalis/bin/CreateIwormFastaBundle -i /home/ssm-user/trinity_out_dir/chrysalis/GraphFromIwormFasta.out -o /home/ssm-user/trinity_out_dir/chrysalis/bundled_iworm_contigs.fasta -min 200
* [Tue May 30 06:07:34 2023] Running CMD: /usr/local/bin/Chrysalis/bin/ReadsToTranscripts -i /home/ssm-user/trinity_out_dir/both.fa -f /home/ssm-user/trinity_out_dir/chrysalis/bundled_iworm_contigs.fasta -o /home/ssm-user/trinity_out_dir/chrysalis/readsToComponents.out -t 4 -max_mem_reads 50000000  -p 10
* [Tue May 30 06:07:37 2023] Running CMD: /usr/bin/sort --parallel=4 -T . -S 1G -k 1,1n -k3,3nr -k2,2 /home/ssm-user/trinity_out_dir/chrysalis/readsToComponents.out > /home/ssm-user/trinity_out_dir/chrysalis/readsToComponents.out.sort
Tuesday, May 30, 2023: 06:07:37 CMD: mkdir -p /home/ssm-user/trinity_out_dir/read_partitions/Fb_0/CBin_0
Tuesday, May 30, 2023: 06:07:37 CMD: touch /home/ssm-user/trinity_out_dir/partitioned_reads.files.list.ok
Tuesday, May 30, 2023: 06:07:37 CMD: /usr/local/bin/util/support_scripts/write_partitioned_trinity_cmds.pl --reads_list_file /home/ssm-user/trinity_out_dir/partitioned_reads.files.list --CPU 1 --max_memory 1G  --run_as_paired  --seqType fa --trinity_complete --full_cleanup  --NO_SEQTK  --no_salmon  > recursive_trinity.cmds
Tuesday, May 30, 2023: 06:07:37 CMD: touch recursive_trinity.cmds.ok
Tuesday, May 30, 2023: 06:07:37 CMD: touch recursive_trinity.cmds.ok


--------------------------------------------------------------------------------
------------ Trinity Phase 2: Assembling Clusters of Reads ---------------------
------- (involving the Inchworm, Chrysalis, Butterfly trifecta ) ---------------
--------------------------------------------------------------------------------

Tuesday, May 30, 2023: 06:07:37 CMD: /usr/local/bin/trinity-plugins/BIN/ParaFly -c recursive_trinity.cmds -CPU 4 -v -shuffle
Number of Commands: 38
succeeded(38)   100% completed.

All commands completed successfully. :-)



** Harvesting all assembled transcripts into a single multi-fasta file...

Tuesday, May 30, 2023: 06:08:29 CMD: find /home/ssm-user/trinity_out_dir/read_partitions/ -name '*inity.fasta'  | /usr/local/bin/util/support_scripts/partitioned_trinity_aggregator.pl --token_prefix TRINITY_DN --output_prefix /home/ssm-user/trinity_out_dir/Trinity.tmp
* [Tue May 30 06:08:29 2023] Running CMD: /usr/local/bin/util/support_scripts/salmon_runner.pl Trinity.tmp.fasta /home/ssm-user/trinity_out_dir/both.fa 4
* [Tue May 30 06:08:31 2023] Running CMD: /usr/local/bin/util/support_scripts/filter_transcripts_require_min_cov.pl Trinity.tmp.fasta /home/ssm-user/trinity_out_dir/both.fa salmon_outdir/quant.sf 2 > /home/ssm-user/trinity_out_dir.Trinity.fasta


#############################################################################
Finished.  Final Trinity assemblies are written to /home/ssm-user/trinity_out_dir.Trinity.fasta
#############################################################################


Tuesday, May 30, 2023: 06:08:31 CMD: /usr/local/bin/util/support_scripts/get_Trinity_gene_to_trans_map.pl /home/ssm-user/trinity_out_dir.Trinity.fasta > /home/ssm-user/trinity_out_dir.Trinity.fasta.gene_trans_map

今回の環境で、サンプルファイルにてTrinityを実行した場合に必要な時間は以下の通りでした。

項目 時間
コンテナのビルド 4分
SIF形式のコンテナイメージ作成 14分
Trinity実行 2分

SIF形式のコンテナイメージは、キャッシュされますので次回以降の実行時間は短縮されます。

エラー対応

今回の検証で、Trinity実行時にいくつからエラー対応が必要となったため、参考までに残しておきます。

ストレージの空き容量不足

no space left on deviceのエラーが出た場合は、ストレージの空き容量不足となります。 今回使用したUbuntuのAMIは、ストレージのデフォルトが8GiBなのですが、Docker HubのTrinityのイメージ3.79GBあるので、多少余裕を持ったストレージ容量が必要となります。

FATAL:   Unable to handle docker://trinityrnaseq/trinityrnaseq:latest uri: while building SIF from layers: conveyor failed to get: writing blob: write /tmp/bundle-temp-1656152970/oci-put-blob1557947261: no space left on device

サンプルファイルが正しくない

no reads made it to the normalization processのエラーが出た場合は、ダウンロードしてきたサンプルファイルに問題がある可能性があります。 今回の検証では、wgetで指定したGitHubのURLが間違っていたために、こちらのエラーが発生しました。

Error, no reads made it to the normalization process...   at /usr/local/bin/util/..//util/support_scripts//nbkc_normalize.pl line 119.
Error, cmd: /usr/local/bin/util/..//util/support_scripts//nbkc_normalize.pl --stats_file pairs.K25.stats --max_cov 200  --min_cov 1 --max_CV 10000 > pairs.K25.stats.C200.maxCV10000.accs died with ret 65280 at /usr/local/bin/util/insilico_read_normalization.pl line 807.
Error, cmd: /usr/local/bin/util/insilico_read_normalization.pl --seqType fq --JM 1G  --max_cov 200 --min_cov 1 --CPU 4 --output /home/ssm-user/tmp/trinity_out_dir/insilico_read_normalization --max_CV 10000  --NO_SEQTK  --left /home/ssm-user/tmp/reads.left.fq.gz --right /home/ssm-user/tmp/reads.right.fq.gz --pairs_together  --PARALLEL_STATS   died with ret 512 at /usr/local/bin/Trinity line 2919.
        main::process_cmd("/usr/local/bin/util/insilico_read_normalization.pl --seqType "...) called at /usr/local/bin/Trinity line 3472
        main::normalize("/home/ssm-user/tmp/trinity_out_dir/insilico_read_normalization", 200, ARRAY(0x557ecfaa43b0), ARRAY(0x557ecfaa43e0)) called at /usr/local/bin/Trinity line 3412
        main::run_normalization(200, ARRAY(0x557ecfaa43b0), ARRAY(0x557ecfaa43e0)) called at /usr/local/bin/Trinity line 1450

参考

まとめ

今回、Apptainerをインストールした単一のEC2上でTrinityを実行しました。
インターネット上に知見があまり見つからない中でのエラー対応は時間がかかりましたが、ちょっとずつ知見が貯まってきたとお思います。

次回は、AWS ParallelClusterでの検証をブログにしていきます!

この記事が、どなたかのお役に立てば幸いです。それでは!