None
Published Histories | jirka | Example history #2 NEW
Import history

Galaxy History ' Example history #2 NEW'

Annotation: Comparative analysis of repeats between two genome variants of rye (Secale cereale), differing in the presence (4B) or absence (0B) of supernumerary B chromosomes.

DatasetAnnotation
1: rye 0B - EBI SRA: ERR058829 File: ERR058829.fastq.gz
595.9 Mb
format: fastq, database: ?
Info: Secale cereale 0B, WGS, 454 reads
@ERR058829.1 GEDC4XP01AIEJF/2
TTTGCGTTATCTAATAAGAATAACTAGGAGTACTTAGCAAGAATAACCTCAATAATTGCAAGGCACACAGGGNGTAGNGNN
+
666GIIIIIIIIIIIIFIFB?==?I=??IIIIIIIIIIIGGFBEBBCCEEEIIIIIIIIIIIIIIIIFIE:8!142;!:!!
@ERR058829.2 GEDC4XP01BDF51/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACAC
0B genotype, sequencing run #1
2: rye 0B - EBI SRA: ERR058830 File: ERR058830.fastq.gz
571.3 Mb
format: fastq, database: ?
Info: Secale cereale 0B, WGS, 454 reads
@ERR058830.1 GEDC4XP02FWOIJ/2
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGTTCTGGATTAGGCCATGCATGCATGCTTGTAAACTACTTGCATAGGTATTACGTAGACATCAAATGGGCTAGGCCTGATTAATTGTAGTGATTGAATAATTATGATTTCCTTATTGTGTCATAGGAAGATATTTCTTTCGTTGATAATTATATGATGGCGCTTGGTGTATTTTTATCCACGCATCTGCACCGATATTTCATGCTCATATTCACTCCATGACAT
AAAATTTTGGCGATATTGGCGCCGAACATGATAATTTGATTTTGTTAATATTATAATTCTACCATCGGTAACGGTGCGCGTACATGAGCCTGAGACTGCCAAGGCACACAGGGGGATAGGGNN
+
6666666666666666666666666666666??44IIIIIIIIIIIIIIIIIIIIIIIIIIIIHHHIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHHIHHHIIIIIIIIIICCCCIIIIIIIIIIIIIIIIIIIIIIIHHHIIIIIIIIIIIIIIIIIIIIIIIHHHIHHHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIE33333?7IIIIIIIIIIIIIIIIIIIHHHIIIIIIIIIIIIIIIIIIIIIIIII
CCCCCCCCCCDHIIIIIIE?<??CEEBBCEEIEE@@@EB4444E88EEEEEEIIIIIIIDDCIIIHHHIIIHHHHHIIIIIIIIIIID???HIIIIIIHH;;;;IEEEFE52555FA>774!!
0B genotype, sequencing run #2
3: rye 4B - EBI SRA: ERP001061 File: ERR058831.fastq.gz
576.1 Mb
format: fastq, database: ?
Info: Secale cereale 4B, WGS, 454 reads
@ERR058831.1 GEIV4VB02G9TAW/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACGACGACACACGACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACANANANANACANACACACACACACACACACACACACACACACANANACACACACACACACACACACACACACACACACACA
CANACACACACACACACACACANACACACACACACACACANANACANANACACACACACANACAC
+
EFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIHHFFFFDDBBA?<<88886933433366852242552222222222222222222222222222222222---22222222222323232222222222222222222222222222222222222---2222222----!2!2!2!232!2.--22222222236333232222222----!2!532222222---23232323233333333333333
33!9333333333332323232!532222222222222--!2!2--!1!53111111111!5311
4B genotype, sequencing run #1
4: rye 4B - EBI SRA: ERP001061 File: ERR058832.fastq.gz
626.5 Mb
format: fastq, database: ?
Info: Secale cereale 4B, WGS, 454 reads
@ERR058832.1 GER0OZY01D4XA2/2
TTTGTGCTTGTTCTCACTTGCTTTGAGAGTCTCGATCATTCGGACTCTAGGTTGATTTTGTTNTTTCTTTTTGTTATCACCTTACTTTCTTCGCGAGTGTCGTACCAACCACCTTATTNTNTCANANAGTGGGNGGGCGACGACGCANACACANNANNCGCCAGCACACGAGGGAGNGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
666FFFFFFFFIIIIIIIIIIHHHIIIIIIIIIIIIIHDBBBHHIIIHHFFDDBB:764/,,!,114,,,,,4--4144-----,,,,5132322<//-/49<:21--,,866---,,!,!-37!-!-33322!,,,:132222--.!2:22-!!3!!3:1-..7:82232,..3-!..!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
@ERR058832.2 GER0OZY01AI6RN/2
CACAGCAGCACACACAGCACACACACACNCACACANACACACACGACGACACGACGACACACAACAACGAACGAACACGACGACGACGTACGAACGAACGAACGACACGACGGACGACGACGACACGACGAACGAACGACACGACGACGAACGAACGACGAACGACGACGACGACGAGACANANACANANACACANACACACACACACACAACAACACACGAACTAACGTAACGTAAACTAACTAACCAACGACAC
4B genotype, sequencing run #2
5: rye 4B - EBI SRA: ERP001061 File: ERR058833.fastq.gz
593.3 Mb
format: fastq, database: ?
Info: Secale cereale 4B, WGS, 454 reads
@ERR058833.1 GER0OZY02GU3GU/2
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTGTTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGTGTGTGTGTGTGTGTGTGTTGTTGTTGTTGTTGTTGTGTGTGTGTGTGTGTCGTCGTCGTCGTGTGTGTTGTTGTGTGTGTG
TGTGTGTGTGTGTGTGTNGTTGTTGTTN
+
DFFFFFFFFFFFHFFFFFFFFFFFDDDDD???AADDDAAADDDDBBAAAA=888333333322222366633333==4448<?=<<<<<<<<<<966666221121,-,,21121121121121121,-,,-,----,--222222222-------------------------,----2222222---2----25686685121,-,,-,-,2-----228855333333992222822--,121,-,239999=
?????==???=884400!?99988323!
4B genotype, sequencing run #3
6: sample repeat database for rye
9 sequences
format: fasta, database: ?
>AF245032_Bilby#LTR/gypsy/chromovirus/CRM
AGCTCGTCAATATCAAGAAGAACAAGATGCCGAGGCTAACCGCCAAGCACAACAACGTCGTGACGCTCAA
GCTATGGAGGCGCAACGTGCTTTACAAGATGCACACCGCAATGTGGCCGCCCGCAATCGACACACTCGTC
AAGAAGAGGATTTTCTTCGAGATGAGATGCAAGAACGGCGCCATCAAGCAAGCATGCAAGAAGAAGCTCC
TCCTCACGCGCGGCAAGAGCAACAACCCCAACAAGAACATCTTCCTCAACAAGAGCAACAAGATATGGCA
AACCGCCAAAATCGAGATCATCCTCCCAATAGACAAGGAAACAACGACGAGCATCGCTATGGCAAGCTCA
A small sample database of selected satellite repeats and centromeric retrotransposon Bilby from rye.
7: Concatenate datasets on data 1 and data 2
1.1 GB
format: fastq, database: ?
@ERR058829.1 GEDC4XP01AIEJF/2
TTTGCGTTATCTAATAAGAATAACTAGGAGTACTTAGCAAGAATAACCTCAATAATTGCAAGGCACACAGGGNGTAGNGNN
+
666GIIIIIIIIIIIIFIFB?==?I=??IIIIIIIIIIIGGFBEBBCCEEEIIIIIIIIIIIIIIIIFIE:8!142;!:!!
@ERR058829.2 GEDC4XP01BDF51/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACAC
Merging 0B reads from sequencing runs #1 and #2.
8: Concatenate datasets on data 3, data 4, and data 5
1.8 GB
format: fastq, database: ?
@ERR058831.1 GEIV4VB02G9TAW/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACGACGACACACGACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACANANANANACANACACACACACACACACACACACACACACACANANACACACACACACACACACACACACACACACACACA
CANACACACACACACACACACANACACACACACACACACANANACANANACACACACACANACAC
+
EFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIHHFFFFDDBBA?<<88886933433366852242552222222222222222222222222222222222---22222222222323232222222222222222222222222222222222222---2222222----!2!2!2!232!2.--22222222236333232222222----!2!532222222---23232323233333333333333
33!9333333333332323232!532222222222222--!2!2--!1!53111111111!5311
Merging 4B reads from sequencing runs #1, #2 and #3.
9: FASTQ Groomer on data 7
1.1 GB
format: fastqsanger, database: ?
Info: Groomed 1136271 sanger reads into sanger reads.
Based upon quality and sequence, the input data is valid for: sanger
Input ASCII range: '!'(33) - 'I'(73)
Input decimal range: 0 - 40
@ERR058829.1 GEDC4XP01AIEJF/2
TTTGCGTTATCTAATAAGAATAACTAGGAGTACTTAGCAAGAATAACCTCAATAATTGCAAGGCACACAGGGNGTAGNGNN
+
666GIIIIIIIIIIIIFIFB?==?I=??IIIIIIIIIIIGGFBEBBCCEEEIIIIIIIIIIIIIIIIFIE:8!142;!:!!
@ERR058829.2 GEDC4XP01BDF51/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACAC
Grooming 0B reads, setting quality score type to "Sanger" (be careful to set this option correctly, based on your input data type). This step has to be performed on any fastq data in order to be used with other tools.
10: FASTQ Groomer on data 8
1.8 GB
format: fastqsanger, database: ?
Info: Groomed 1736007 sanger reads into sanger reads.
Based upon quality and sequence, the input data is valid for: sanger
Input ASCII range: '!'(33) - 'I'(73)
Input decimal range: 0 - 40
@ERR058831.1 GEIV4VB02G9TAW/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACGACGACACACGACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACANANANANACANACACACACACACACACACACACACACACACANANACACACACACACACACACACACACACACACACACA
CANACACACACACACACACACANACACACACACACACACANANACANANACACACACACANACAC
+
EFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIHHFFFFDDBBA?<<88886933433366852242552222222222222222222222222222222222---22222222222323232222222222222222222222222222222222222---2222222----!2!2!2!232!2.--22222222236333232222222----!2!532222222---23232323233333333333333
33!9333333333332323232!532222222222222--!2!2--!1!53111111111!5311
Grooming 4B reads.
STEPS 9 - 22 are always run with the same parameters for 0B and 4B data.
11: Trim sequences on data 9
475.1 MB
format: fastqsanger, database: ?
Info: Trimming: base 1 to 200
Input: 1136271 reads.
Output: 1136271 reads.
@ERR058829.1 GEDC4XP01AIEJF/2
TTTGCGTTATCTAATAAGAATAACTAGGAGTACTTAGCAAGAATAACCTCAATAATTGCAAGGCACACAGGGNGTAGNGNN
+
666GIIIIIIIIIIIIFIFB?==?I=??IIIIIIIIIIIGGFBEBBCCEEEIIIIIIIIIIIIIIIIFIE:8!142;!:!!
@ERR058829.2 GEDC4XP01BDF51/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACAC
[0B reads] Reads are trimmed to the same length (200 nt) in order to make different sets comparable. In this step, reads longer than 200 nt are trimmed, but shorter reads are retained (will be removed later).
12: Trim sequences on data 10
725.9 MB
format: fastqsanger, database: ?
Info: Trimming: base 1 to 200
Input: 1736007 reads.
Output: 1736007 reads.
@ERR058831.1 GEIV4VB02G9TAW/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACGACGACACACGACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACANANANANACANACACACACACACA
+
EFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIHHFFFFDDBBA?<<88886933433366852242552222222222222222222222222222222222---22222222222323232222222222222222222222222222222222222---2222222----!2!2!2!232!2.--222222222
@ERR058831.2 GEIV4VB02GA15G/2
TTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTT
[4B reads] ...
13: Filter by quality on data 11
471.5 MB
format: fastqsanger, database: ?
Info: Quality cut-off: 10
Minimum percentage: 95
Input: 1136271 reads.
Output: 1127191 reads.
discarded 9080 (0%) low-quality reads.
@ERR058829.1 GEDC4XP01AIEJF/2
TTTGCGTTATCTAATAAGAATAACTAGGAGTACTTAGCAAGAATAACCTCAATAATTGCAAGGCACACAGGGNGTAGNGNN
+
666GIIIIIIIIIIIIFIFB?==?I=??IIIIIIIIIIIGGFBEBBCCEEEIIIIIIIIIIIIIIIIFIE:8!142;!:!!
@ERR058829.2 GEDC4XP01BDF51/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACAC
[0B reads] Keeping only the reads with sequence quality at least 10 over 95% of their length.
14: Filter by quality on data 12
718.0 MB
format: fastqsanger, database: ?
Info: Quality cut-off: 10
Minimum percentage: 95
Input: 1736007 reads.
Output: 1716248 reads.
discarded 19759 (1%) low-quality reads.
@ERR058831.1 GEIV4VB02G9TAW/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACGACGACACACGACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACANANANANACANACACACACACACA
+
EFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIHHFFFFDDBBA?<<88886933433366852242552222222222222222222222222222222222---22222222222323232222222222222222222222222222222222222---2222222----!2!2!2!232!2.--222222222
@ERR058831.2 GEIV4VB02GA15G/2
TTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTT
[4B reads] ...
15: renamed Filter by quality on data 11
473.7 MB
format: fastqsanger, database: ?
@0BERR058829.1 GEDC4XP01AIEJF/2
TTTGCGTTATCTAATAAGAATAACTAGGAGTACTTAGCAAGAATAACCTCAATAATTGCAAGGCACACAGGGNGTAGNGNN
+
666GIIIIIIIIIIIIFIFB?==?I=??IIIIIIIIIIIGGFBEBBCCEEEIIIIIIIIIIIIIIIIFIE:8!142;!:!!
@0BERR058829.2 GEDC4XP01BDF51/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACAC
[0B reads] Adding "0B" prefix to read names. Note that output file type should be changed from "fastq" to "fastqsanger" upon completion.
16: renamed Filter by quality on data 12
721.3 MB
format: fastqsanger, database: ?
@4BERR058831.1 GEIV4VB02G9TAW/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACGACGACACACGACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACANANANANACANACACACACACACA
+
EFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIHHFFFFDDBBA?<<88886933433366852242552222222222222222222222222222222222---22222222222323232222222222222222222222222222222222222---2222222----!2!2!2!232!2.--222222222
@4BERR058831.2 GEIV4VB02GA15G/2
TTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTT
[4B reads] Adding "4B" prefix.
17: FASTQ to FASTA on data 15
1,072,263 sequences
format: fasta, database: ?
Info: Input: 1127191 reads.
Output: 1072263 reads.
discarded 54928 (4%) low-quality reads.
>0BERR058829.2 GEDC4XP01BDF51/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACAC
>0BERR058829.6 GEDC4XP01DFKEI/2
GGTTATCAATTATGGTTACACTCCAATATCTGTATGCATTGCAGGGTTTATCGCACCCACCACGGAGTTTCCTTACGTCCCACTGGGAGTGTTTCTAGTCCACTACGATTTGTTCACGACCTTTCGGTACGTGTCCCTTCGCTTGTTGGGGAGGTAGGCGGGGGCCCCCGGTCCCGTCGACTAGGTCCGACCCCCAACCC
>0BERR058829.9 GEDC4XP01CJCJO/2
CACACACAACACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAA
[0B reads] Conversion to FASTA format, keeping current read names and discarding reads containing "N"s.
18: FASTQ to FASTA on data 16
1,335,455 sequences
format: fasta, database: ?
Info: Input: 1716248 reads.
Output: 1335455 reads.
discarded 380793 (22%) low-quality reads.
>4BERR058831.2 GEIV4VB02GA15G/2
TTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTT
>4BERR058831.3 GEIV4VB02IC6LC/2
CATACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACA
>4BERR058831.4 GEIV4VB02F8YJ6/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACGACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACGACACTACACACACACACACACACACACACTACACACACACACACAC
[4B reads] ...
19: Filter sequences by length on data 17
1,071,992 sequences
format: fasta, database: ?
>0BERR058829.2 GEDC4XP01BDF51/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACAC
>0BERR058829.6 GEDC4XP01DFKEI/2
GGTTATCAATTATGGTTACACTCCAATATCTGTATGCATTGCAGGGTTTATCGCACCCACCACGGAGTTTCCTTACGTCCCACTGGGAGTGTTTCTAGTCCACTACGATTTGTTCACGACCTTTCGGTACGTGTCCCTTCGCTTGTTGGGGAGGTAGGCGGGGGCCCCCGGTCCCGTCGACTAGGTCCGACCCCCAACCC
>0BERR058829.9 GEDC4XP01CJCJO/2
CACACACAACACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAA
[0B reads] Reads shorter than 200 nt are removed; remaining reads have the same length of 200 nt.
20: Filter sequences by length on data 18
1,335,287 sequences
format: fasta, database: ?
>4BERR058831.2 GEIV4VB02GA15G/2
TTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTTGTT
>4BERR058831.3 GEIV4VB02IC6LC/2
CATACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACA
>4BERR058831.4 GEIV4VB02F8YJ6/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACGACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACGACACTACACACACACACACACACACACACTACACACACACACACAC
[4B reads] ...
21: Random selection output (from-Filter sequences by length on data 17, with number of sequences 100000)
100,000 sequences
format: fasta, database: ?
>0BERR058829.19 GEDC4XP01CL5DO/2
TTTTTGCGCTTGACTTATTTTGCAAGAACTTGAAAACTGCCATCCGAGGCCTGATGATCATCTCTCACCTGCAACGCCAACATGAAAGTCTATATTAGTATAAGATGAGCAAACAATTAATACCACCATGAATTTATTGGATGACAAGCTGTGGAGAAATCCATTTGTTGGTGTATCCACATGTTTAGTTTGACTGTTCT
>0BERR058829.24 GEDC4XP01C2KX3/2
GTTTGACCGATAAGATCTTCGTAGAATATGTAGGAGCCAATATGAGCATCTAGGTTCCGCTATTGGTTATTGACCGGAGATGTGTCTCGGTCATGTCTACATAGTTCTCGAACCCGTAGGGTTCGGCACGCTTAACGTTCGATGACGATTAGTTATTATTGCGTTATTGTGATTTTGTTATGACCGAACGGAATTGTTTA
>0BERR058829.29 GEDC4XP01BQQON/2
GTTTGCATTGTTTGCATATGCTTGCATGTTTAATGATTCTGTTTGAGGATAAGAGTATTAAATATGCAGAGGCAATTAGTATGCAATGTCAAATTATAATTTTGGTGATTTTCTATAGTAGAGAATGTTAAGGTTTTGAGTTGATTTATACTAACTTATCTCACGAGTTCTTGTTGTAGTTTTATGTGAATGTAAGTTTT
[0B reads] Random selection of 100,000 reads.
22: Random selection output (from-Filter sequences by length on data 18, with number of sequences 100000)
100,000 sequences
format: fasta, database: ?
>4BERR058831.15 GEIV4VB02G8VE7/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACAC
>4BERR058831.18 GEIV4VB02H604L/2
CGACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACAC
>4BERR058831.24 GEIV4VB02HOB6L/2
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACGACACGACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACAC
[4B reads] ...
23: Concatenate datasets on data 21 and data 22
200,000 sequences
format: fasta, database: ?
>0BERR058829.19 GEDC4XP01CL5DO/2
TTTTTGCGCTTGACTTATTTTGCAAGAACTTGAAAACTGCCATCCGAGGCCTGATGATCATCTCTCACCTGCAACGCCAACATGAAAGTCTATATTAGTATAAGATGAGCAAACAATTAATACCACCATGAATTTATTGGATGACAAGCTGTGGAGAAATCCATTTGTTGGTGTATCCACATGTTTAGTTTGACTGTTCT
>0BERR058829.24 GEDC4XP01C2KX3/2
GTTTGACCGATAAGATCTTCGTAGAATATGTAGGAGCCAATATGAGCATCTAGGTTCCGCTATTGGTTATTGACCGGAGATGTGTCTCGGTCATGTCTACATAGTTCTCGAACCCGTAGGGTTCGGCACGCTTAACGTTCGATGACGATTAGTTATTATTGCGTTATTGTGATTTTGTTATGACCGAACGGAATTGTTTA
>0BERR058829.29 GEDC4XP01BQQON/2
GTTTGCATTGTTTGCATATGCTTGCATGTTTAATGATTCTGTTTGAGGATAAGAGTATTAAATATGCAGAGGCAATTAGTATGCAATGTCAAATTATAATTTTGGTGATTTTCTATAGTAGAGAATGTTAAGGTTTTGAGTTGATTTATACTAACTTATCTCACGAGTTCTTGTTGTAGTTTTATGTGAATGTAAGTTTT
Merging datastets of 100,000 reads from 0B and 4B samples. The reads have the same length of 200 nt and their origin is encoded as a prefix (0B or 4B) in their IDs. This is the input dataset for clustering analysis.
28: Archive with clustering results from dataset 23
1,166,080 lines
format: zip, database: ?
binary/unknown file
29: Contigs from dataset 23 based on clustering
9,576 sequences
format: fasta, database: ?
>CL1Contig1 (1512-123.8-187128)
TAGGGCACCATCTAAATCTTTGGAGACGACACCGTATGAACTGTGGTTTGACGAGAAACC
TAAGCTGTCATTTCTTTAAGTTTGGGACTGCGATGCTTATGTGAAAAAGTTTCAACATGA
TAAGCTCGAACCCAAATCGGAGAAGTAAATCTTCATAGGATACCCAAAAGAAACTGTTGG
GTACACCTTCTATCACAGATCCGAAGGCAAGGTCTTTGTTGCTAAGAGTGGATCCTTTCT
AGAGAAGGAGTTTCTCTCGAAAGAAGTGAGTGGGAGGAAAGTAGAACTTGATGAGGTAAT
30: Log information of clustering based on dataset 23
6,848 lines
format: txt, database: ?
True
This is clustering pipeline
GRAPH BASED CLUSTERING
**********************************************************************
Data preparation started:
31: HTML summary of graph based clustering on dataset 23
141.4 KB
format: html, database: ?
HTML file