ESPRIT :: Installation :: Cluster Obtain the source code

On *nix, Steps to install ESPRIT $ unzip ESPRIT_distribution.zip $ cd ESPRIT_distribution

Read, esprit_user_guide.pdf and README.txt $ cd source $ vim Makefile Choose the platform by uncomment/comment

To make the package $ make esprit_cc

Its always better to $ make clean $ make esprit_cc

Precaution:

  • make sure that the fasta file has header in one and the sequence in one line
  • if the sequence is in multiple lines, convert the file to contain just one line of sequence

Pseudocode is followed here using shell scripting and clusterjobmanager Copy the sequence file here, If you have more than one than group them. $ cp /path/to/sequence.fas .

To run preproc $ /path/to/ESPRIT_distribution/source/preproc -f sequence.fas 160794 Seqs Match Primer 160794 Seqs Valid Len

31072 Seqs After Process 1.63 secs in Purging Strings.

flag: -f this prevents the program from trimming.

Files created: sequence_Clean.fas sequence_Clean.frq

To check $ awk -F’ ‘ ‘{ s+=$2 } END { print s }’ sequence_Clean.frq 160794 $ grep -c ‘>’ sequence.fas 160794

  • Make sure that these numbers are same.

To run kmerdist_par $ cat submit_kmer_jobs.sh for i in $(seq 1 10) do for j in $(seq $i 10) do job=”/path/to/ESPRIT_distribution/source/kmerdist_par sequence_Clean.fas 10 $i $j\n “; RANDOM=10 num=$RANDOM echo -e $job > kmer_job_$num.clusterJob clusterJobSubmission < kmer_job_$num.clusterJob done done - where clusterJobSubmission is your cluster job submission manager - the extension .clusterJob can be replaced with the extension required - variable job can include other details if required. $ cat jobs.clusterJob ## .. sh submit_kmer_jobs.sh $ jobsubmit < jobs. clusterJob - this will submit the job. Output: sequence_Clean_[]_[].dist - make sure that numbers are 1[1-10], 2[2-10], 3[3-10], 4[4-10], 5[5-10], 6[6-10], 7[7-10], 8[8-10], 9[9-10], 10_10 Merge all the .dist files $ cat sequence_Clean*.dist » kmer.dist

Split the kmer files into 100 files $ /path/to/ESPRIT_distribution/source/splitdist -s 100 kmer.dist Counting Total Records…. 71249223 Records Found, Splitting…

Output: kmer.dist_[0-99]

Submit parallel jobs for needle_dist $ cat submit_needle_job.sh for i in $(seq 0 99) do job=”/path/to/ESPRIT_distribution/source/needledist sequence_Clean.fas kmer.dist_$i needle.dist_$i\n “; RANDOM=10 num=$RANDOM echo -e $job > needle_job_$num.clusterJob clusterJobSubmission < needle_job_$num.clusterJob done Output needle.dist_[0-99] Group all the needle.dist files $ cat needle.dist_* » sequence.ndist

To run hcluster $ /path/to/ESPRIT_distribution/source/hcluster -t 15000 sequence.ndist sequence_Clean.frq

  • flag -t is used to increase the size of the linked table, default is 10000

Output sequence.ndist_sort sequence.OTU sequence.Outliers sequence.Rarefaction sequence.Cluster sequence.Cluster_List sequence.ACE sequence.CHAO1