Dropbox has released an additional free 2GB space named as a "work Dropbox account" connected to your original account
To start it, you could quickly register from their webpage, following their tour.
After registering, you can sync it by doing this:
1. Click on the Dropbox icon -> click on the gear icon
2. Preferences... ->Account tab -> Buisiness tab
3. Link and log in
Wednesday, April 26, 2017
Monday, April 24, 2017
Reading a Json String on Linux/Unix Shell? Here's how I use jq
This prints out all the fields in your json string:
jq keys YourJsonString
This prints out the the field called "accession" of all your nodes:
jq '.files[].accession' YourJsonString
Usually, assuming the outputs follow a constant order, you can juxtapose the fields you need together to make a table by typing this:
paste -s <(jq '.files[].href' test) <(jq '.files[].accession' YourJsonString)
New type of phishing email targeting Caltech students
From: Library Services - California Institute of Technology <libraryservices@hacettepe. edu.tr>
Date: Mon, Apr 24, 2017 at 9:43 AM
Subject: Library Services
To: brian.he@caltech.edu
Date: Mon, Apr 24, 2017 at 9:43 AM
Subject: Library Services
To: brian.he@caltech.edu
Dear Student,
https://clsproxy.library.calte ch.edu/login
If you are not able to login, please contact Sarah Miller at shmiller@emich.edu for immediate assistance.
Sincerely,
Sarah MillerCalifornia Institute of TechnologyCaltech Library1200 E. California Blvd.Mail Code 10-90Pasadena, CA 91125T: 626-395-3405
https://clsproxy.library.calte
If you are not able to login, please contact Sarah Miller at shmiller@emich.edu for immediate assistance.
Sincerely,
Sarah MillerCalifornia Institute of TechnologyCaltech Library1200 E. California Blvd.Mail Code 10-90Pasadena, CA 91125T: 626-395-3405
Saturday, April 22, 2017
How to copy the hyperlink in a cell in Excel spreadsheet without opening it, on Mac
move your cursor to select that cell using your array keys
command + k
then command + c
command + k
then command + c
Thursday, April 20, 2017
How to download additional data from eLife
Figures & Data tab -> Additional FIles tab
Tuesday, April 18, 2017
Cufflink outputs FAILS
http://seqanswers.com/forums/showthread.php?t=14864
Cole Trapnell
A bit more on what FAIL means, and how it can happen. We use FAIL for genes that actually throw a numerical exception during isoform abundance calculation. In Cufflinks and Cuffdiff, there's a couple of calculations that require us to build matrices with either a row per transcript and a column per read (more or less) or a square matrix with a row and column for each transcript. Some of these matrices need to be invertible or positive definite or have other properties in order for the next steps in the algorithm to succeed. However, sometimes (due to things like round-off error) they aren't. Other times, missing data causes trouble. Oddly enough, this is actually more likely to happen the more reads you get overall, because you can see that isoforms are present, but you don't actually have enough data to calculate those abundances. This is the effect you were observing above. So since we can't be sure about the values (and in fact, were we to go ahead and do the calculation anyways, they could be *wildly* off in theory, or even negative), we set them to zero and move on.
In order to make differential expression estimates more conservative, version 1.1.0 really ramped up the checks that are done before these steps so we don't end up reporting false positives that are due to numerical exceptions. However, users (like yourselves) have been pretty frustrated by those changes, so I've spent the last few weeks going back and streamlining the overall algorithm to actually eliminate pieces that require the matrices to have some of those properties. The main offender was our "importance sampling" procedure, which tries to give us a sense (for each gene) for the accuracy for the maximum likelihood estimate of isoform abundances. This procedure was originally meant to improve the robustness of the estimate when one or more isoforms were close to zero, but in practice, we found that it actually hurts as often as it helps. Moreover, this procedure would often FAIL genes, so I removed it altogether. I've compensated on the differential expression side with some other statistical improvements and fixes, and the result is globally more accurate differential analysis (both in terms of fewer FAILs and fewer false positives than 1.1.0).
The upcoming version 1.2.0 should drastically reduce the number of FAIL genes, though there will still be some. If we can't calculate an MLE to begin with, or if for some reason the confidence interval calculation fails, a gene will be marked as FAIL.
Hope this sheds light on things.
Cole Trapnell
A bit more on what FAIL means, and how it can happen. We use FAIL for genes that actually throw a numerical exception during isoform abundance calculation. In Cufflinks and Cuffdiff, there's a couple of calculations that require us to build matrices with either a row per transcript and a column per read (more or less) or a square matrix with a row and column for each transcript. Some of these matrices need to be invertible or positive definite or have other properties in order for the next steps in the algorithm to succeed. However, sometimes (due to things like round-off error) they aren't. Other times, missing data causes trouble. Oddly enough, this is actually more likely to happen the more reads you get overall, because you can see that isoforms are present, but you don't actually have enough data to calculate those abundances. This is the effect you were observing above. So since we can't be sure about the values (and in fact, were we to go ahead and do the calculation anyways, they could be *wildly* off in theory, or even negative), we set them to zero and move on.
In order to make differential expression estimates more conservative, version 1.1.0 really ramped up the checks that are done before these steps so we don't end up reporting false positives that are due to numerical exceptions. However, users (like yourselves) have been pretty frustrated by those changes, so I've spent the last few weeks going back and streamlining the overall algorithm to actually eliminate pieces that require the matrices to have some of those properties. The main offender was our "importance sampling" procedure, which tries to give us a sense (for each gene) for the accuracy for the maximum likelihood estimate of isoform abundances. This procedure was originally meant to improve the robustness of the estimate when one or more isoforms were close to zero, but in practice, we found that it actually hurts as often as it helps. Moreover, this procedure would often FAIL genes, so I removed it altogether. I've compensated on the differential expression side with some other statistical improvements and fixes, and the result is globally more accurate differential analysis (both in terms of fewer FAILs and fewer false positives than 1.1.0).
The upcoming version 1.2.0 should drastically reduce the number of FAIL genes, though there will still be some. If we can't calculate an MLE to begin with, or if for some reason the confidence interval calculation fails, a gene will be marked as FAIL.
Hope this sheds light on things.
Load a bam file onto UCSC genome browser
track type=bam name=file1 bigDataUrl=http://link/to/your/bam/file
just be sure you have indexed the bam file.
from http://seqanswers.com/forums/showthread.php?t=11785
just be sure you have indexed the bam file.
from http://seqanswers.com/forums/showthread.php?t=11785
HOw to check Tophat mapping stat / mapping rate
In the log folder inside the output folder,
look for bowtie.*.log
simplest way to know how many mapped:
look for bowtie.*.log
simplest way to know how many mapped:
samtools view accepted_hits.bam | cut -f1 | sort | uniq | wc -l
Actually,
The first round : bowtie.left_kept_reads.m2g_um.log
The second round : bowtie.left_kept_reads.m2g_um_unmapped.log
Bam file on UCSC genome browser: How to interpret the colors etc
http://genome.ucsc.edu/goldenpath/help/hgBamTrackHelp.html
alignments on the reverse strand are colored dark red, alignments on the forward strand are colored dark blue.
For the full explanation:
alignments on the reverse strand are colored dark red, alignments on the forward strand are colored dark blue.
For the full explanation:
Configuring BAM tracks |
Genome Browser BAM tracks may be configured in a variety of ways to highlight different aspects of the displayed information. The configuration options are described here and related to custom track settings that can alter the default appearance of the custom track. Click here for more information on BAM custom track creation.
When you have finished making your configuration changes, click the Submit button to return to the annotation track display page.
|
How to get DNA sequence (fasta) from UCSC browser
view-> DNA
UCSC genome browser make Karyogram / ideogram /genome chromosome view
Tools -> Genome graphs -> upload your data
for the chromosome base format:
chromosome position score
Alternatively:
Tools -> Genome graphs -> import your customer track that has already been loaded. Choose coverage mode
for the chromosome base format:
chromosome position score
Alternatively:
Tools -> Genome graphs -> import your customer track that has already been loaded. Choose coverage mode
.fastq files and adaptors
http://onetipperday.blogspot.com/2012/08/three-ways-to-trim-adaptorprimer.html#uds-search-results
Three ways to trim adaptor/primer sequences for paired-end reads
1. Understanding the adaptors (skip this part if you're familiar with the Illumina adaptor)
Before trimming anything from the reads, let's get clear what the reads content is.Taking Trufseq reads (from Illumina HiSeq 2000) as example, here is the read file (fastq) looks like:
$ cat r1.fq
@3VFXHS1:278:D13Y4ACXX:1:1101:1472:2209 1:N:0:CGATGT
CTGGTATTGTCTCTTCCCACACTGAACTCTGGGGAATTCGATGTGTGGCACAGCCCGGCTCAGCCTGCCCGCTGGTGGGAGCCCCTGGGAAGCTGCGGCGC
+
@@CFDDFFGH>CAEH:CGHIJJJJEIHJJHIJJJ?DHIDIJHGEGHJG;FHC9@B(5@6A=EH:B@B@2=>>B?BDCBD<B52<<ABD?<?B1@A9>B###
@3VFXHS1:278:D13Y4ACXX:1:1101:1434:2224 1:N:0:CGATGT
GGCAGAGCCAATCTTCGGACGTGGTGATTGTCTCCTCTAAGTACAAACAGCGCTATGAGTGTCGCCTGCCAGCTGGAGCTATTCACTTCCAGCGTGAAAGG
+
BC@FFFFFHHHGFHIIJJIJGFHICFCGIHGFHFGGCHD@F?B?BGGHJJIG6D@EHEHHEHCD259?AACD@AC59?,(5>A,;>:@C(::(029?8>@A
@3VFXHS1:278:D13Y4ACXX:1:1101:1712:2247 1:N:0:CGATGT
GTACACTTGAACACATTTTTCTAACCTTAGAAAATACCTACAAGGCCTGTTGTCTTGACCCATTACTCAATTGTCCCTGGCATATTATCTGATCTTCACGT
+
CCCFFFFFHHHHGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJGIIJGGGDHHHIHIHHEIJIJJIIGGIIIIFEHHHHFEFFFDEEEDCEEEDEDC?A
@3VFXHS1:278:D13Y4ACXX:1:1101:3318:2215 1:N:0:CGATGT
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAGAAAAAAAAACAAGCGACAAGGACAGA
+
CCCFFFFFHHHHHJJJJJIJJJJJJJJJIJGHFFHIAHIFGGIIJJIJJFIJIHJIHHHGEGFE>CFFEB###############################
@3VFXHS1:278:D13Y4ACXX:2:1101:5344:2243 1:N:0:ACAGTG
GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAACAACAGAAAAAACAAAGCGCGAACAGTGC
+
CCCFFFFFHHHHHJIJJIJIJHJJJJJJJHIGIGIHJFHFFDIIJGIIIJJJHHFIJJJJJJHCEHHFD################################
$ cat r2.fq
@3VFXHS1:278:D13Y4ACXX:1:1101:1472:2209 2:N:0:CGATGT
CTGGATTTGAAATCTTTAGCGGAGCGGGAACGCCGGCGCGGAAGGGTCTCTACACAGGGCCCGGTCCGCCCTTGCGCTCTCCTTAATGNNNNNNNNNNCGC
+
@CCF?EFFHHHGHHHIFGI@HGGIEHIGIJGI6@@E>B8>??:DBD++399>ACCDDDD@DBDD58@BBDDDDD@@<@BDDDDC>CDC#############
@3VFXHS1:278:D13Y4ACXX:1:1101:1434:2224 2:N:0:CGATGT
CCCGGGGCCTCCCATTAAGGTCGCACTTGGACCCATTGCCATAGGTCTGGCTGTGGTAGCGTTTAAGACGATGCTGCTTGGAGGCCTTGGCTGTTTCATCA
+
BCCFFFFDHFFHGIHIHGJJIJJGGIIJJJIDHGJIJEIIIIIJJIIJIJHHHEEF;>DDA>BBB@CAABBBDDCDDDDDDAD@@?CDDDCCB?ACCDDC#
@3VFXHS1:278:D13Y4ACXX:1:1101:1712:2247 2:N:0:CGATGT
GTTACTCAGCATTTATTCATGCCTGCTGTGTACGGAAAGGGCAGTTACAAAGGAAAGCCTTGATGATTCTGCTTCCAAGAAACGTGAAGATCAGATAATAT
+
CBCFFFFFHHHHHJIJJIIJJJJJIIJIHIHIJJJIJIJIJJIIIIIIJJGGIIJJJIJJJJHIJJJJHIJHGHHHHFFFFFEDECBDDDDDCCDDDCDEE
@3VFXHS1:278:D13Y4ACXX:1:1101:3318:2215 2:N:0:CGATGT
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAAAAAACAGAAAAAGAAAGAAGGAAAGGNGAGT
+
CCCFFFFFHHHHHJJJJFHHHJJJIJJGGIDGHIEIJJJIIJGIJ5@FHIB?DFFEEEDEEEDD?####################################
@3VFXHS1:278:D13Y4ACXX:2:1101:5344:2243 2:N:0:ACAGTG
GGATCGGGAAAGGGGGGGGGGGGGGAAAAGGGGGGATTTCCGGGGGGGCCGGTTCTTTTAAAAAAAAAAAAAAAGAAAACAGAAACAGAAGATGGACAACA
+
CCCFFFFFHHHHHJJJJFHHHJJJIJJGGIDGHIEIJJJIIJGIJ5@FHIB?DFFEEEDEEEDD?####################################
First of first, it's critical to understand what your reads file contain; do they contain adaptor sequences? do they contain primer sequence? I strongly recommend to read the description file here: http://genomics.med.tufts.edu/documents/protocols/TUCF_Understanding_Illumina_TruSeq_Adapters.pdf, from which we could know that the constructed dsDNA (before binding to the flow cell for sequencing) looks like:
Where
Trufseq Universal Adaptor:
5´AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT3´
5´AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT3´
--> reverse complementary
5´AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT3´
TruSeq Indexed Adapter:
5´GATCGGAAGAGCACACGTCTGAACTCCAGTCAC‐NNNNNN‐ATCTCGTATGCCGTCTTCTGCTTG3´
TruSeq Indexed Adapter:
5´GATCGGAAGAGCACACGTCTGAACTCCAGTCAC‐NNNNNN‐ATCTCGTATGCCGTCTTCTGCTTG3´
The 6-nt "NNNNNN" is barcode for multiplexing. We noticed that the 3´ of Universal adaptor is reverse-complementary to the 5´ of Indexed adaptor (Why? This is to form the Y-shape adaptor. See ZZ's dUTP figure in another post).
So, next step is how to remove the contamination.
2. Collect adaptor sequences
To remove contamination, we first should collect all possible "contamination" sources. For our case, we collect all used barcode sequences and generated the adaptor file:
usage: fastq-mcf [options] <adapters.fa> <reads.fq> [mates1.fq ...]
Paired End Mode:
java -classpath <path to trimmomatic jar> org.usadellab.trimmomatic.TrimmomaticPE [-threads <threads>] [-phred33 | -phred64] [-trimlog <logFile>] <input 1> <input 2> <paired output 1> <unpaired output 1> <paired output 2> <unpaired output 2> <step 1> ...
Step options:
These two programs give same results:
http://zlab.umassmed.edu/~dongx/tracks/RNAseq/BU/Demultiplex_Stats.htm
copy content and save to txt file Demultiplex_Stats.htm
grep -v Undetermined barcode_stat.html | awk '{print ">"$2; print "GATCGGAAGAGCACACGTCTGAACTCCAGTCAC"$4"ATCTCGTATGCCGTCTTCTGCTTG";}' > adaptor.fa
The Demultiplex_Stats.htm file, which contains barcode information of each sample, is usually included in the output folder of sequencing. Otherwise, you can consult from the data producer.
You may also want to append the universal adaptor:
$cat >> adaptor.fa
>Trufseq_Universal_Adaptor
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
(Ctrl+D)
A full list of commonly-used adaptors can be retrieved from the following URL (returned by Google search) by command:
curl -s http://www.omicsoft.com/downloads/ngs/contamination_list/v1.txt 2>&1 | sed "/^\s\+$/d;s/\t\+/\t/g;s/ /_/g;s/,//g;s/'//g" | awk '{print ">"$1; print $2;}' > adaptors_list.fa
This does not include Trufseq barcodes used for smallRNA (see below), which you have to include yourself:
http://epigenome.usc.edu/docs/resources/core_protocols/TruSeq%20index%20sequences-3.pdf
3. Remove adaptor
There might be many ways to remove adaptors, but specifically for PE reads (e.g. both reads are removed if one of the pairs is disqualified), I'd like to introduce three ways to do this.
a. FastqMcf (http://code.google.com/p/ea-utils/wiki/FastqMcf)
Options:
-h This help
-o FIL Output file (stats to stdout)
-s N.N Log scale for clip pct to threshold (2.2)
-t N % occurance threshold before clipping (0.25)
-m N Minimum clip length, overrides scaled auto (1)
-p N Maximum adapter difference percentage (10)
-l N Minimum remaining sequence length (19)
-L N Maximum sequence length (none)
-k N sKew percentage-less-than causing trim (2)
-q N quality threshold causing trimming (10)
-w N window-size for quality trimming (1)
-f force output, even if not much will be done
-F FIL remove sequences that align to FIL
-0 Set all trimming parameters to zero
-U|u Force disable/enable illumina PF filtering
-P N phred-scale (auto)
-x N 'N' (Bad read) percentage causing trimming (20)
-R Don't remove N's from the fronts/ends of reads
-n Don't clip, just output what would be done
-C N Number of reads to use for subsampling (200k)
-S FIL Save clipped reads to file
-d Output lots of random debugging stuff
For example,
fastq-mcf -o c1.fq -o c2.fq -l 16 -q 15 -w 4 -x 10 -u -P 33 adaptor.fa r1.fq r2.fq &>r.log
b. Trimmomatic (http://www.usadellab.org/cms/index.php?page=trimmomatic)
Paired End Mode:
java -classpath <path to trimmomatic jar> org.usadellab.trimmomatic.TrimmomaticPE [-threads <threads>] [-phred33 | -phred64] [-trimlog <logFile>] <input 1> <input 2> <paired output 1> <unpaired output 1> <paired output 2> <unpaired output 2> <step 1> ...
Step options:
- ILLUMINACLIP:<fastaWithAdaptersEtc>:<seed mismatches>:<palindrome clip threshold>:<simple clip threshold>
- fastaWithAdaptersEtc: specifies the path to a fasta file containing all the adapters, PCR sequences etc. The naming of the various sequences within this file determines how they are used. See below.
- seedMismatches: specifies the maximum mismatch count which will still allow a full match to be performed
- palindromeClipThreshold: specifies how accurate the match between the two 'adapter ligated' reads must be for PE palindrome read alignment.
- simpleClipThreshold: specifies how accurate the match between any adapter etc. sequence must be against a read.
- SLIDINGWINDOW:<windowSize>:<requiredQuality>
- windowSize: specifies the number of bases to average across
- requiredQuality: specifies the average quality required.
- LEADING:<quality>
- quality: Specifies the minimum quality required to keep a base.
- TRAILING:<quality>
- quality: Specifies the minimum quality required to keep a base.
- CROP:<length>
- length: The number of bases to keep, from the start of the read.
- HEADCROP:<length>
- length: The number of bases to remove from the start of the read.
- MINLENGTH:<length>
- length: Specifies the minimum length of reads to be kept.
java -classpath $CLASSPATH/trimmomatic-0.22.jar org.usadellab.trimmomatic.TrimmomaticPE -phred33 -trimlog r.log r1.fq r2.fq t1.fq t1.unpaired.fq t2.fq t2.unpaired.fq LEADING:3 TRAILING:3 ILLUMINACLIP:adaptor.fa:2:40:15 SLIDINGWINDOW:4:15 MINLEN:16
These two programs give same results:
@3VFXHS1:278:D13Y4ACXX:1:1101:1472:2209 1:N:0:CGATGT
CTGGTATTGTCTCTTCCCACACTGAACTCTGGGGAATTCGATGTGTGGCACAGCCCGGCTCAGCCTGCCCGCTGGTGGGAGCCCCTGGGAAGCTGCGG
+
@@CFDDFFGH>CAEH:CGHIJJJJEIHJJHIJJJ?DHIDIJHGEGHJG;FHC9@B(5@6A=EH:B@B@2=>>B?BDCBD<B52<<ABD?<?B1@A9>B
@3VFXHS1:278:D13Y4ACXX:1:1101:1434:2224 1:N:0:CGATGT
GGCAGAGCCAATCTTCGGACGTGGTGATTGTCTCCTCTAAGTACAAACAGCGCTATGAGTGTCGCCTGCCAGCTGGAGCTATTCACTTCCAGCGTGAAAGG
+
BC@FFFFFHHHGFHIIJJIJGFHICFCGIHGFHFGGCHD@F?B?BGGHJJIG6D@EHEHHEHCD259?AACD@AC59?,(5>A,;>:@C(::(029?8>@A
@3VFXHS1:278:D13Y4ACXX:1:1101:1712:2247 1:N:0:CGATGT
GTACACTTGAACACATTTTTCTAACCTTAGAAAATACCTACAAGGCCTGTTGTCTTGACCCATTACTCAATTGTCCCTGGCATATTATCTGATCTTCACGT
+
CCCFFFFFHHHHGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJGIIJGGGDHHHIHIHHEIJIJJIIGGIIIIFEHHHHFEFFFDEEEDCEEEDEDC?A
Cite what Simon posted here, HTSeq has facilities useful for this.
You can write a little Python script like this:
You can write a little Python script like this:
import itertools
import HTSeq
in1 = iter( HTSeq.FastqReader( "mydata_1.fastq" ) )
in2 = iter( HTSeq.FastqReader( "mydata_2.fastq" ) )
out1 = open( "trimmed_1.fastq", "w" )
out2 = open( "trimmed_2.fastq", "w" )
for read1, read2 in itertools.izip( in1, in2 ):
read1.trim_right_end( "ACGGTC" )
read2.trim_left_end( "TTCGAC" )
read1.write_to_fastq_file( out1 )
read2.write_to_fastq_file( out2 )
out1.close()
out2.close()
I've not figured out how to apply a fasta file of adaptor.
condor trouble shoot
Always Idle: server has bad parts
condor_status.
(1) No output --> something wrong
(2) All claimed ---> good and wait
(3) LoadAv above number of cpu
Held:
condor_q -better-analyze -global
The end of the Hold reason contains the likely error message
condor_status.
(1) No output --> something wrong
(2) All claimed ---> good and wait
(3) LoadAv above number of cpu
Held:
condor_q -better-analyze -global
The end of the Hold reason contains the likely error message
(1)Permission denied
If you get this on a script, it's checking to see if the file is "executable" which means it's permissions look like:
~$ ls -l script.py -rwxrwxr-x 1 diane diane 103 2009-11-12 14:24 script.py*
(Note the 'x'es in the first column. those tell the operating system and condor that the owning user (first x), the owning group (second x) and everyone else (third x). can run this script.
However if you just change the permissions, you're likely to run into the Failed to execute <script> error. So you should just go read that solution now.
(2)Failed to execute <script>
This can happen to a variety of scripts, shell scripts, python scripts, etc. Basically anything that is not a binary executable. The discussion below assumes a python script.
condor doesn't know that '.py's should be run with the python interpreter. So you have two choices for how to tell it.
One is to change the permissions of eland_results... to include the "executable" bit with something likechmod a+g eland_results... (or chmod 755 eland_results). which should change the ls -l output from
-rw-r--r-- 1 user user 953 2010-01-09 15:31 eland_results_to_fasta_input.py
to:
-rwxr-xr-x 1 user user 953 2010-01-09 15:31 eland_results_to_fasta_input.py
in addition if using this method you'll also need to add:
#!/usr/bin/env python
to the top of the file. The advantage to this is now linux will know that this is an executable and you can run it with eland_results_to_fasta_input.py args.. from the shell as well. (leaving off the python.)
The other choice is to change the executable in the condor submit script from eland_results_to_fasta_input.py to python and treat eland_results_to_fasta_input.py as the first argument in the condor submit script.
Condor stuck after python tophat run
Condor was running tophat commands four by four
After all my 14 commands finished, it began to run bowtie commands 2 by 2 without letting later submitted commands run, which takes infinite time.
Solution: Stop all the bowtie commands being run and resubmit
After all my 14 commands finished, it began to run bowtie commands 2 by 2 without letting later submitted commands run, which takes infinite time.
Solution: Stop all the bowtie commands being run and resubmit
Ensembl Track on UCSC thick block and thin blocks
Question:
"What is the significance of the thinner blocks displayed at the beginning and end of a gene in the browser?"
"What is the significance of the thinner blocks displayed at the beginning and end of a gene in the browser?"
Response:
The varying thickness of features in the Genome Browser gene tracks denotes the various structural features of a gene, such as exons, introns, and untranslated regions (UTRs). The thickest parts of the track indicate the coding exon regions within the gene. The slightly thinner portions at the leading and trailing ends of the gene track show the 5' and 3' UTRs. Introns are depicted as lines with arrows indicating the direction of transcription.
The varying thickness of features in the Genome Browser gene tracks denotes the various structural features of a gene, such as exons, introns, and untranslated regions (UTRs). The thickest parts of the track indicate the coding exon regions within the gene. The slightly thinner portions at the leading and trailing ends of the gene track show the 5' and 3' UTRs. Introns are depicted as lines with arrows indicating the direction of transcription.
Some aspects of the graphical representation are inevitably lost upon rescaling. For example, coding exons are given preference at coarse scales. For single exon genes, there is no place to put the strand orientation wedges, and therefore the feature's detail page must be consulted.
For more information about annotation track display conventions within the Genome Browser, consult the User's Guide.
bam and sam commands e.g. Convert Bam to Sam, sam to fastq
convert bam to sam: $ samtools view -h -o out.sam in.bam
convert sam to bam: $ samtools view -bT .faFILE samfile
sort the bam: $ samtools sort Example.bam Example.sorted
view a specific region(must be properly indexed while indexing depends on sorting):
$ samtools view Example.bam chr17:220-300
Print the header of bam
samtools view -H in.bam
The end of sam file has the bowtie parameters used.
sam to fastq
cat samplename.nomapping.sam | grep -v ^@ | awk '{print "@"$1"\n"$10"\n+\n"$11}' > unmapped/samplename.fastq
cat: XXXX: input file is output file
This is because the output file is already there and it happens to be an input file, too.
It usually occurs when you rerun some cat commands.
Solution: When re-running, always delete all the previous files and folders
It usually occurs when you rerun some cat commands.
Solution: When re-running, always delete all the previous files and folders
tophat: bowtie1 default parameters
usually this:
-v 2
-g/maxmulti = -k = -m 20
But you have to check your current version
-v 2
-g/maxmulti = -k = -m 20
But you have to check your current version
Condor task Held "H". Why?
Do "condor_q -long 44337 and look for HoldReason. In this case:
HoldReason = "Error from slot1@XXX.caltech.edu: Failed to open '/..shell.0.out' as standard output: No such file or directory (errno 2)"
HoldReason = "Error from slot1@XXX.caltech.edu: Failed to open '/..shell.0.out' as standard output: No such file or directory (errno 2)"
Some notes on ImageJ
Add pseudo colors to images
Image -> Color -> Channel tools... -> more -> blue/red/....
1. Download Ics opener here http://valelab.ucsf.edu/~nstuurman/IJplugins/Ics_Opener.html
Put it into folder Application / imageJ/ plugins /jars
2. Restart image j.
3. Drag your ics file onto the imageJ window
4. ImageJ asks you to select the corresponding ids file.
You might need to increase your memory if your files are large.
Merge cells---Excel
IMPORTANT Only the data in the upper-left cell of a range of selected cells will remain in the merged cell. Data in other cells of the selected range will be deleted.
- If the data that you want to display in the merged cell is not in the upper-left cell, do the following:
- Select the data that you want to display in the merged cell, and then click Copy on the Standardtoolbar.
- Select the upper-left cell of the range of adjacent cells that you want to merge, and then click Paste on the Standard toolbar.
- Select the cells that you want to merge.
NOTE The cells that you select must be adjacent.
- On the Formatting toolbar, click Merge and Center .
The cells will be merged in a row or column, and the cell contents will be centered in the merged cell.
NOTE If the Merge and Center button is unavailable, the selected cell may be in editing mode. To cancel editing mode, press ENTER.
- To change the text alignment in the merged cell, select the cell, and then click Align Left or Align Right on the Formatting toolbar.
Split merged cells
You can split only cells that were previously merged.
- Select the merged cell.
When you select a merged cell, the Merge and Center button also appears selected on the Formattingtoolbar.
- To unmerge cells, click Merge and Center .
NOTE When the merged cell is split, the contents of the merged cell will appear in the upper-left cell of the range of split cells.
Merge the contents of multiple cells into one cell
You can use a formula with the ampersand (&) operator to combine text from multiple cells into one cell.
- Select the cell in which you want to combine the contents of other cells.
- To start the formula, type =(
- Select the first cell that contains the text that you want to combine, type &" "& (with a space between the quotation marks), and then select the next cell that contains the text that you want to combine.
To combine the contents of more than two cells, continue selecting cells, making sure to type &" "& between selections. If you don't want to add a space between combined text, type & instead of &" "&. To insert a comma, type &", "& (with a comma followed by a space between the quotation marks).
- To finalize the formula, type )
- To see the results of the formula, press ENTER.
Tip You can also use the CONCATENATE function to combine text from multiple cells into one cell.
Example
The following example worksheet shows the available formulas that you can use. The example may be easier to understand if you copy it to a blank worksheet.
|
|
NOTE The formula inserts a space between the first and last names by using a space enclosed within quotation marks. Use quotation marks to include any literal text — text that does not change — in the result.
Split the contents of cells across multiple cells
You cannot split a cell or range of cells that was not previously merged. You can, however, divide the contents of unmerged cells and display them across other cells.
- Select the cell, the range of cells, or the entire column that contains the text values that you want to divide across other cells. A range can be any number of rows tall, but no more than one column wide.
IMPORTANT Unless there are one or more blank columns to the right of the selected column, the data to the right of the selected column will be overwritten.
- On the Data menu, click Text to Columns.
- Follow the instructions in the Convert Text to Columns Wizard to specify how you want to divide the text into columns.
Subscribe to:
Posts (Atom)