Non-academic troubleshooting and skills: April 2017

Wednesday, April 26, 2017

How to sync your "work Dropbox account" to your Mac computer

Dropbox has released an additional free 2GB space named as a "work Dropbox account" connected to your original account

To start it, you could quickly register from their webpage, following their tour.

After registering, you can sync it by doing this:
1. Click on the Dropbox icon -> click on the gear icon
2. Preferences... ->Account tab -> Buisiness tab
3. Link and log in

Monday, April 24, 2017

Reading a Json String on Linux/Unix Shell? Here's how I use jq

This prints out all the fields in your json string:

jq keys YourJsonString

This prints out the the field called "accession" of all your nodes:

jq '.files[].accession' YourJsonString

Usually, assuming the outputs follow a constant order, you can juxtapose the fields you need together to make a table by typing this:

paste -s <(jq '.files[].href' test) <(jq '.files[].accession' YourJsonString)

New type of phishing email targeting Caltech students

From: Library Services - California Institute of Technology <libraryservices@hacettepe.edu.tr>
Date: Mon, Apr 24, 2017 at 9:43 AM
Subject: Library Services
To: brian.he@caltech.edu

Dear Student,

Your access to your library account is expiring soon due to inactivity. To continue to have access to the library services, you must reactivate your account.
For this purpose, click the web address below or copy and paste it into your web browser. A successful login will activate your account and you will be redirected to your library profile.

https://clsproxy.library.caltech.edu/login
If you are not able to login, please contact Sarah Miller at shmiller@emich.edu for immediate assistance.
Sincerely,
Sarah MillerCalifornia Institute of TechnologyCaltech Library1200 E. California Blvd.Mail Code 10-90Pasadena, CA 91125T: 626-395-3405

Saturday, April 22, 2017

How to copy the hyperlink in a cell in Excel spreadsheet without opening it, on Mac

move your cursor to select that cell using your array keys
command + k
then command + c

Thursday, April 20, 2017

How to download additional data from eLife

Figures & Data tab -> Additional FIles tab

Tuesday, April 18, 2017

Cufflink outputs FAILS

http://seqanswers.com/forums/showthread.php?t=14864

Cole Trapnell
A bit more on what FAIL means, and how it can happen. We use FAIL for genes that actually throw a numerical exception during isoform abundance calculation. In Cufflinks and Cuffdiff, there's a couple of calculations that require us to build matrices with either a row per transcript and a column per read (more or less) or a square matrix with a row and column for each transcript. Some of these matrices need to be invertible or positive definite or have other properties in order for the next steps in the algorithm to succeed. However, sometimes (due to things like round-off error) they aren't. Other times, missing data causes trouble. Oddly enough, this is actually more likely to happen the more reads you get overall, because you can see that isoforms are present, but you don't actually have enough data to calculate those abundances. This is the effect you were observing above. So since we can't be sure about the values (and in fact, were we to go ahead and do the calculation anyways, they could be *wildly* off in theory, or even negative), we set them to zero and move on.

In order to make differential expression estimates more conservative, version 1.1.0 really ramped up the checks that are done before these steps so we don't end up reporting false positives that are due to numerical exceptions. However, users (like yourselves) have been pretty frustrated by those changes, so I've spent the last few weeks going back and streamlining the overall algorithm to actually eliminate pieces that require the matrices to have some of those properties. The main offender was our "importance sampling" procedure, which tries to give us a sense (for each gene) for the accuracy for the maximum likelihood estimate of isoform abundances. This procedure was originally meant to improve the robustness of the estimate when one or more isoforms were close to zero, but in practice, we found that it actually hurts as often as it helps. Moreover, this procedure would often FAIL genes, so I removed it altogether. I've compensated on the differential expression side with some other statistical improvements and fixes, and the result is globally more accurate differential analysis (both in terms of fewer FAILs and fewer false positives than 1.1.0).

The upcoming version 1.2.0 should drastically reduce the number of FAIL genes, though there will still be some. If we can't calculate an MLE to begin with, or if for some reason the confidence interval calculation fails, a gene will be marked as FAIL.

Hope this sheds light on things.

Load a bam file onto UCSC genome browser

track type=bam name=file1 bigDataUrl=http://link/to/your/bam/file

just be sure you have indexed the bam file.

from http://seqanswers.com/forums/showthread.php?t=11785

HOw to check Tophat mapping stat / mapping rate

In the log folder inside the output folder,
look for bowtie.*.log

simplest way to know how many mapped:

samtools view accepted_hits.bam | cut -f1 | sort | uniq | wc -l

Actually,

The first round : bowtie.left_kept_reads.m2g_um.log

The second round : bowtie.left_kept_reads.m2g_um_unmapped.log

Bam file on UCSC genome browser: How to interpret the colors etc

http://genome.ucsc.edu/goldenpath/help/hgBamTrackHelp.html

alignments on the reverse strand are colored dark red, alignments on the forward strand are colored dark blue.

For the full explanation:

Configuring BAM tracks


	Genome Browser BAM tracks may be configured in a variety of ways to highlight different aspects of the displayed information. The configuration options are described here and related to custom track settings that can alter the default appearance of the custom track. Click here for more information on BAM custom track creation. Attempt to join paired end reads by name: This checkbox appears only if `pairEndsByName` is included in the track settings. When checked, SAM/BAM records with the same name will be joined into pairs for display, with a line drawn between them. Minimum alignment quality: Exclude alignments with quality less than the given number. The default is 0, unless changed by the track setting `minAliQual`. Color track by bases: By default, mismatching bases are highlighted in the display. Change the selection to "item bases" to see all base values from the query sequence, or "OFF" to ignore query sequence. Additional coloring modes: Other aspects of the alignments can be displayed in color or grayscale. The default mode is 'Color by strand' (`bamColorMode=strand`), unless the `bamColorMode` track setting specifies `gray`, `tag` or `off`. Color by strand: alignments on the reverse strand are colored dark red, alignments on the forward strand are colored dark blue. Grayscale: items are shaded according to the chosen method: alignment quality, base qualities, or unpaired ends. Items' alignment qualities are shaded on a scale of 0 (lightest) to 99 (darkest). Base qualities are shaded on a scale of 0 (lightest) to 40 (darkest). When "unpaired ends" is selected, items that were paired in sequencing but whose mate was not mapped are colored gray, while singletons and properly paired items are black. Alignment quality is the default (`bamGrayMode=aliQual`) unless `bamGrayMode` track setting is `baseQual` or `unpaired`. Use R,G,B colors specified in user-defined tag: SAM/BAM may include user-defined tags, whose names begin with X, Y or Z and include one other letter or number. The user-defined tag named here specifies red, green and blue (RGB) intensities as a zero-terminated string (tag type `Z`) containing comma-separated triples of numbers from 0-255. For example, if a SAM/BAM record includes the tag `YC:Z:255,0,0`, then the item is colored red;`YC:Z:0,0,255` makes the item blue. By default, the tag is "YC" unless changed using the track setting `bamColorTag`. No additional coloring When you have finished making your configuration changes, click the Submit button to return to the annotation track display page.

Genome Browser BAM tracks may be configured in a variety of ways to highlight different aspects of the displayed information. The configuration options are described here and related to custom track settings that can alter the default appearance of the custom track. Click here for more information on BAM custom track creation.

Attempt to join paired end reads by name: This checkbox appears only if pairEndsByName is included in the track settings. When checked, SAM/BAM records with the same name will be joined into pairs for display, with a line drawn between them.
Minimum alignment quality: Exclude alignments with quality less than the given number. The default is 0, unless changed by the track setting minAliQual.
Color track by bases: By default, mismatching bases are highlighted in the display. Change the selection to "item bases" to see all base values from the query sequence, or "OFF" to ignore query sequence.
Additional coloring modes: Other aspects of the alignments can be displayed in color or grayscale. The default mode is 'Color by strand' (bamColorMode=strand), unless the bamColorMode track setting specifies gray, tag or off.
- Color by strand: alignments on the reverse strand are colored dark red, alignments on the forward strand are colored dark blue.
- Grayscale: items are shaded according to the chosen method: alignment quality, base qualities, or unpaired ends. Items' alignment qualities are shaded on a scale of 0 (lightest) to 99 (darkest). Base qualities are shaded on a scale of 0 (lightest) to 40 (darkest). When "unpaired ends" is selected, items that were paired in sequencing but whose mate was not mapped are colored gray, while singletons and properly paired items are black. Alignment quality is the default (bamGrayMode=aliQual) unless bamGrayMode track setting is baseQual or unpaired.
- Use R,G,B colors specified in user-defined tag: SAM/BAM may include user-defined tags, whose names begin with X, Y or Z and include one other letter or number. The user-defined tag named here specifies red, green and blue (RGB) intensities as a zero-terminated string (tag type Z) containing comma-separated triples of numbers from 0-255. For example, if a SAM/BAM record includes the tag YC:Z:255,0,0, then the item is colored red;YC:Z:0,0,255 makes the item blue. By default, the tag is "YC" unless changed using the track setting bamColorTag.
- No additional coloring

When you have finished making your configuration changes, click the Submit button to return to the annotation track display page.

How to get DNA sequence (fasta) from UCSC browser

view-> DNA

UCSC genome browser make Karyogram / ideogram /genome chromosome view

Tools -> Genome graphs -> upload your data

for the chromosome base format:
chromosome position score

Alternatively:
Tools -> Genome graphs -> import your customer track that has already been loaded. Choose coverage mode

.fastq files and adaptors

http://onetipperday.blogspot.com/2012/08/three-ways-to-trim-adaptorprimer.html#uds-search-results

Three ways to trim adaptor/primer sequences for paired-end reads

1. Understanding the adaptors (skip this part if you're familiar with the Illumina adaptor)

Before trimming anything from the reads, let's get clear what the reads content is.

Taking Trufseq reads (from Illumina HiSeq 2000) as example, here is the read file (fastq) looks like:

$ cat r1.fq
@3VFXHS1:278:D13Y4ACXX:1:1101:1472:2209 1:N:0:CGATGT
CTGGTATTGTCTCTTCCCACACTGAACTCTGGGGAATTCGATGTGTGGCACAGCCCGGCTCAGCCTGCCCGCTGGTGGGAGCCCCTGGGAAGCTGCGGCGC
+
@@CFDDFFGH>CAEH:CGHIJJJJEIHJJHIJJJ?DHIDIJHGEGHJG;FHC9@B(5@6A=EH:B@B@2=>>B?BDCBD<B52<<ABD?<?B1@A9>B###
@3VFXHS1:278:D13Y4ACXX:1:1101:1434:2224 1:N:0:CGATGT
GGCAGAGCCAATCTTCGGACGTGGTGATTGTCTCCTCTAAGTACAAACAGCGCTATGAGTGTCGCCTGCCAGCTGGAGCTATTCACTTCCAGCGTGAAAGG
+
BC@FFFFFHHHGFHIIJJIJGFHICFCGIHGFHFGGCHD@F?B?BGGHJJIG6D@EHEHHEHCD259?AACD@AC59?,(5>A,;>:@C(::(029?8>@A
@3VFXHS1:278:D13Y4ACXX:1:1101:1712:2247 1:N:0:CGATGT
GTACACTTGAACACATTTTTCTAACCTTAGAAAATACCTACAAGGCCTGTTGTCTTGACCCATTACTCAATTGTCCCTGGCATATTATCTGATCTTCACGT
+
CCCFFFFFHHHHGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJGIIJGGGDHHHIHIHHEIJIJJIIGGIIIIFEHHHHFEFFFDEEEDCEEEDEDC?A
@3VFXHS1:278:D13Y4ACXX:1:1101:3318:2215 1:N:0:CGATGT
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAGAAAAAAAAACAAGCGACAAGGACAGA
+
CCCFFFFFHHHHHJJJJJIJJJJJJJJJIJGHFFHIAHIFGGIIJJIJJFIJIHJIHHHGEGFE>CFFEB###############################
@3VFXHS1:278:D13Y4ACXX:2:1101:5344:2243 1:N:0:ACAGTG
GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAACAACAGAAAAAACAAAGCGCGAACAGTGC
+
CCCFFFFFHHHHHJIJJIJIJHJJJJJJJHIGIGIHJFHFFDIIJGIIIJJJHHFIJJJJJJHCEHHFD################################

$ cat r2.fq
@3VFXHS1:278:D13Y4ACXX:1:1101:1472:2209 2:N:0:CGATGT
CTGGATTTGAAATCTTTAGCGGAGCGGGAACGCCGGCGCGGAAGGGTCTCTACACAGGGCCCGGTCCGCCCTTGCGCTCTCCTTAATGNNNNNNNNNNCGC
+
@CCF?EFFHHHGHHHIFGI@HGGIEHIGIJGI6@@E>B8>??:DBD++399>ACCDDDD@DBDD58@BBDDDDD@@<@BDDDDC>CDC#############
@3VFXHS1:278:D13Y4ACXX:1:1101:1434:2224 2:N:0:CGATGT
CCCGGGGCCTCCCATTAAGGTCGCACTTGGACCCATTGCCATAGGTCTGGCTGTGGTAGCGTTTAAGACGATGCTGCTTGGAGGCCTTGGCTGTTTCATCA
+
BCCFFFFDHFFHGIHIHGJJIJJGGIIJJJIDHGJIJEIIIIIJJIIJIJHHHEEF;>DDA>BBB@CAABBBDDCDDDDDDAD@@?CDDDCCB?ACCDDC#
@3VFXHS1:278:D13Y4ACXX:1:1101:1712:2247 2:N:0:CGATGT
GTTACTCAGCATTTATTCATGCCTGCTGTGTACGGAAAGGGCAGTTACAAAGGAAAGCCTTGATGATTCTGCTTCCAAGAAACGTGAAGATCAGATAATAT
+
CBCFFFFFHHHHHJIJJIIJJJJJIIJIHIHIJJJIJIJIJJIIIIIIJJGGIIJJJIJJJJHIJJJJHIJHGHHHHFFFFFEDECBDDDDDCCDDDCDEE
@3VFXHS1:278:D13Y4ACXX:1:1101:3318:2215 2:N:0:CGATGT
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAAAAAACAGAAAAAGAAAGAAGGAAAGGNGAGT
+
CCCFFFFFHHHHHJJJJFHHHJJJIJJGGIDGHIEIJJJIIJGIJ5@FHIB?DFFEEEDEEEDD?####################################
@3VFXHS1:278:D13Y4ACXX:2:1101:5344:2243 2:N:0:ACAGTG
GGATCGGGAAAGGGGGGGGGGGGGGAAAAGGGGGGATTTCCGGGGGGGCCGGTTCTTTTAAAAAAAAAAAAAAAGAAAACAGAAACAGAAGATGGACAACA
+
CCCFFFFFHHHHHJJJJFHHHJJJIJJGGIDGHIEIJJJIIJGIJ5@FHIB?DFFEEEDEEEDD?####################################

First of first, it's critical to understand what your reads file contain; do they contain adaptor sequences? do they contain primer sequence? I strongly recommend to read the description file here: http://genomics.med.tufts.edu/documents/protocols/TUCF_Understanding_Illumina_TruSeq_Adapters.pdf, from which we could know that the constructed dsDNA (before binding to the flow cell for sequencing) looks like:

Where

Trufseq Universal Adaptor:
5´AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT3´

--> reverse complementary

5´AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT3´

TruSeq Indexed Adapter:
5´GATCGGAAGAGCACACGTCTGAACTCCAGTCAC‐NNNNNN‐ATCTCGTATGCCGTCTTCTGCTTG3´

The 6-nt "NNNNNN" is barcode for multiplexing. We noticed that the 3´ of Universal adaptor is reverse-complementary to the 5´ of Indexed adaptor (Why? This is to form the Y-shape adaptor. See ZZ's dUTP figure in another post).

Combining together the "Overrepresented sequences" of FASTQC output file (for example), we could infer that the reads are contaminated by adaptor/primers (for example the red-marked part in the fastq sequences).

So, next step is how to remove the contamination.

2. Collect adaptor sequences

To remove contamination, we first should collect all possible "contamination" sources. For our case, we collect all used barcode sequences and generated the adaptor file:

http://zlab.umassmed.edu/~dongx/tracks/RNAseq/BU/Demultiplex_Stats.htm

copy content and save to txt file Demultiplex_Stats.htm

grep -v Undetermined barcode_stat.html | awk '{print ">"$2; print "GATCGGAAGAGCACACGTCTGAACTCCAGTCAC"$4"ATCTCGTATGCCGTCTTCTGCTTG";}' > adaptor.fa

The Demultiplex_Stats.htm file, which contains barcode information of each sample, is usually included in the output folder of sequencing. Otherwise, you can consult from the data producer.

You may also want to append the universal adaptor:

$cat >> adaptor.fa

>Trufseq_Universal_Adaptor

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT

(Ctrl+D)

A full list of commonly-used adaptors can be retrieved from the following URL (returned by Google search) by command:

curl -s http://www.omicsoft.com/downloads/ngs/contamination_list/v1.txt 2>&1 | sed "/^\s\+$/d;s/\t\+/\t/g;s/ /_/g;s/,//g;s/'//g" | awk '{print ">"$1; print $2;}' > adaptors_list.fa

This does not include Trufseq barcodes used for smallRNA (see below), which you have to include yourself:

http://epigenome.usc.edu/docs/resources/core_protocols/TruSeq%20index%20sequences-3.pdf

3. Remove adaptor

There might be many ways to remove adaptors, but specifically for PE reads (e.g. both reads are removed if one of the pairs is disqualified), I'd like to introduce three ways to do this.

a. FastqMcf (http://code.google.com/p/ea-utils/wiki/FastqMcf)

usage: fastq-mcf [options] <adapters.fa> <reads.fq> [mates1.fq ...]

Options:

-h This help

-o FIL Output file (stats to stdout)

-s N.N Log scale for clip pct to threshold (2.2)

-t N % occurance threshold before clipping (0.25)

-m N Minimum clip length, overrides scaled auto (1)

-p N Maximum adapter difference percentage (10)

-l N Minimum remaining sequence length (19)

-L N Maximum sequence length (none)

-k N sKew percentage-less-than causing trim (2)

-q N quality threshold causing trimming (10)

-w N window-size for quality trimming (1)

-f force output, even if not much will be done

-F FIL remove sequences that align to FIL

-0 Set all trimming parameters to zero

-U|u Force disable/enable illumina PF filtering

-P N phred-scale (auto)

-x N 'N' (Bad read) percentage causing trimming (20)

-R Don't remove N's from the fronts/ends of reads

-n Don't clip, just output what would be done

-C N Number of reads to use for subsampling (200k)

-S FIL Save clipped reads to file

-d Output lots of random debugging stuff

For example,

fastq-mcf -o c1.fq -o c2.fq -l 16 -q 15 -w 4 -x 10 -u -P 33 adaptor.fa r1.fq r2.fq &>r.log

b. Trimmomatic (http://www.usadellab.org/cms/index.php?page=trimmomatic)

Paired End Mode:
java -classpath <path to trimmomatic jar> org.usadellab.trimmomatic.TrimmomaticPE [-threads <threads>] [-phred33 | -phred64] [-trimlog <logFile>] <input 1> <input 2> <paired output 1> <unpaired output 1> <paired output 2> <unpaired output 2> <step 1> ...

Step options:

ILLUMINACLIP:<fastaWithAdaptersEtc>:<seed mismatches>:<palindrome clip threshold>:<simple clip threshold>

fastaWithAdaptersEtc: specifies the path to a fasta file containing all the adapters, PCR sequences etc. The naming of the various sequences within this file determines how they are used. See below.
seedMismatches: specifies the maximum mismatch count which will still allow a full match to be performed
palindromeClipThreshold: specifies how accurate the match between the two 'adapter ligated' reads must be for PE palindrome read alignment.
simpleClipThreshold: specifies how accurate the match between any adapter etc. sequence must be against a read.

SLIDINGWINDOW:<windowSize>:<requiredQuality>

windowSize: specifies the number of bases to average across
requiredQuality: specifies the average quality required.

LEADING:<quality>

quality: Specifies the minimum quality required to keep a base.

TRAILING:<quality>

quality: Specifies the minimum quality required to keep a base.

CROP:<length>

length: The number of bases to keep, from the start of the read.

HEADCROP:<length>

length: The number of bases to remove from the start of the read.

MINLENGTH:<length>

length: Specifies the minimum length of reads to be kept.

Following the above example:

java -classpath $CLASSPATH/trimmomatic-0.22.jar org.usadellab.trimmomatic.TrimmomaticPE -phred33 -trimlog r.log r1.fq r2.fq t1.fq t1.unpaired.fq t2.fq t2.unpaired.fq LEADING:3 TRAILING:3 ILLUMINACLIP:adaptor.fa:2:40:15 SLIDINGWINDOW:4:15 MINLEN:16

These two programs give same results:

@3VFXHS1:278:D13Y4ACXX:1:1101:1472:2209 1:N:0:CGATGT

CTGGTATTGTCTCTTCCCACACTGAACTCTGGGGAATTCGATGTGTGGCACAGCCCGGCTCAGCCTGCCCGCTGGTGGGAGCCCCTGGGAAGCTGCGG

@@CFDDFFGH>CAEH:CGHIJJJJEIHJJHIJJJ?DHIDIJHGEGHJG;FHC9@B(5@6A=EH:B@B@2=>>B?BDCBD<B52<<ABD?<?B1@A9>B

@3VFXHS1:278:D13Y4ACXX:1:1101:1434:2224 1:N:0:CGATGT

GGCAGAGCCAATCTTCGGACGTGGTGATTGTCTCCTCTAAGTACAAACAGCGCTATGAGTGTCGCCTGCCAGCTGGAGCTATTCACTTCCAGCGTGAAAGG

BC@FFFFFHHHGFHIIJJIJGFHICFCGIHGFHFGGCHD@F?B?BGGHJJIG6D@EHEHHEHCD259?AACD@AC59?,(5>A,;>:@C(::(029?8>@A

@3VFXHS1:278:D13Y4ACXX:1:1101:1712:2247 1:N:0:CGATGT

GTACACTTGAACACATTTTTCTAACCTTAGAAAATACCTACAAGGCCTGTTGTCTTGACCCATTACTCAATTGTCCCTGGCATATTATCTGATCTTCACGT

CCCFFFFFHHHHGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJGIIJGGGDHHHIHIHHEIJIJJIIGGIIIIFEHHHHFEFFFDEEEDCEEEDEDC?A

c. HTSeq
(http://www-huber.embl.de/users/anders/HTSeq/doc/sequences.html#HTSeq.SequenceWithQualities.trim_left_end_with_quals)

Cite what Simon posted here, HTSeq has facilities useful for this.

You can write a little Python script like this:

import itertools
import HTSeq
in1 = iter( HTSeq.FastqReader( "mydata_1.fastq" ) )
in2 = iter( HTSeq.FastqReader( "mydata_2.fastq" ) )
out1 = open( "trimmed_1.fastq", "w" )
out2 = open( "trimmed_2.fastq", "w" )
for read1, read2 in itertools.izip( in1, in2 ):
read1.trim_right_end( "ACGGTC" )
read2.trim_left_end( "TTCGAC" )
read1.write_to_fastq_file( out1 )
read2.write_to_fastq_file( out2 )
out1.close()
out2.close()

I've not figured out how to apply a fasta file of adaptor.

condor trouble shoot

Always Idle: server has bad parts
condor_status.
(1) No output --> something wrong
(2) All claimed ---> good and wait
(3) LoadAv above number of cpu

Held:
condor_q -better-analyze -global
The end of the Hold reason contains the likely error message

(1)Permission denied

If you get this on a script, it's checking to see if the file is "executable" which means it's permissions look like:

~$ ls -l script.py
-rwxrwxr-x 1 diane diane 103 2009-11-12 14:24 script.py*

(Note the 'x'es in the first column. those tell the operating system and condor that the owning user (first x), the owning group (second x) and everyone else (third x). can run this script.

However if you just change the permissions, you're likely to run into the Failed to execute <script> error. So you should just go read that solution now.

(2)Failed to execute <script>

This can happen to a variety of scripts, shell scripts, python scripts, etc. Basically anything that is not a binary executable. The discussion below assumes a python script.

condor doesn't know that '.py's should be run with the python interpreter. So you have two choices for how to tell it.

One is to change the permissions of eland_results... to include the "executable" bit with something likechmod a+g eland_results... (or chmod 755 eland_results). which should change the ls -l output from

-rw-r--r-- 1 user user 953 2010-01-09 15:31 eland_results_to_fasta_input.py

to:

-rwxr-xr-x 1 user user 953 2010-01-09 15:31 eland_results_to_fasta_input.py

in addition if using this method you'll also need to add:

#!/usr/bin/env python

to the top of the file. The advantage to this is now linux will know that this is an executable and you can run it with eland_results_to_fasta_input.py args.. from the shell as well. (leaving off the python.)

The other choice is to change the executable in the condor submit script from eland_results_to_fasta_input.py to python and treat eland_results_to_fasta_input.py as the first argument in the condor submit script.

Condor stuck after python tophat run

Condor was running tophat commands four by four
After all my 14 commands finished, it began to run bowtie commands 2 by 2 without letting later submitted commands run, which takes infinite time.

Solution: Stop all the bowtie commands being run and resubmit

Ensembl Track on UCSC thick block and thin blocks

Question:
"What is the significance of the thinner blocks displayed at the beginning and end of a gene in the browser?"

Response:
The varying thickness of features in the Genome Browser gene tracks denotes the various structural features of a gene, such as exons, introns, and untranslated regions (UTRs). The thickest parts of the track indicate the coding exon regions within the gene. The slightly thinner portions at the leading and trailing ends of the gene track show the 5' and 3' UTRs. Introns are depicted as lines with arrows indicating the direction of transcription.

Some aspects of the graphical representation are inevitably lost upon rescaling. For example, coding exons are given preference at coarse scales. For single exon genes, there is no place to put the strand orientation wedges, and therefore the feature's detail page must be consulted.

For more information about annotation track display conventions within the Genome Browser, consult the User's Guide.

bam and sam commands e.g. Convert Bam to Sam, sam to fastq

convert bam to sam:  $ samtools view -h -o out.sam in.bam

convert sam to bam:  $ samtools view -bT .faFILE samfile

sort the bam:  $ samtools sort Example.bam Example.sorted

view a specific region(must be properly indexed while indexing depends on sorting):

$ samtools view Example.bam chr17:220-300

Print the header of bam

samtools view -H in.bam

The end of sam file has the bowtie parameters used.

sam to fastq

cat samplename.nomapping.sam | grep -v ^@ | awk '{print "@"$1"\n"$10"\n+\n"$11}' > unmapped/samplename.fastq

cat: XXXX: input file is output file

This is because the output file is already there and it happens to be an input file, too.

It usually occurs when you rerun some cat commands.

Solution: When re-running, always delete all the previous files and folders

tophat: bowtie1 default parameters

usually this:
-v 2
-g/maxmulti = -k = -m 20

But you have to check your current version

Condor task Held "H". Why?

Do "condor_q -long 44337 and look for HoldReason. In this case:

HoldReason = "Error from slot1@XXX.caltech.edu: Failed to open '/..shell.0.out' as standard output: No such file or directory (errno 2)"

Some notes on ImageJ

Add pseudo colors to images

Image -> Color -> Channel tools... -> more -> blue/red/....

How to open .ics files

1. Download Ics opener here http://valelab.ucsf.edu/~nstuurman/IJplugins/Ics_Opener.html

Put it into folder Application / imageJ/ plugins /jars

2. Restart image j.

3. Drag your ics file onto the imageJ window

4. ImageJ asks you to select the corresponding ids file.

You might need to increase your memory if your files are large.

[imagej] increase memory

Edit --> Options --> memory

Merge cells---Excel

IMPORTANT Only the data in the upper-left cell of a range of selected cells will remain in the merged cell. Data in other cells of the selected range will be deleted.

If the data that you want to display in the merged cell is not in the upper-left cell, do the following:

Select the data that you want to display in the merged cell, and then click Copy on the Standardtoolbar.
Select the upper-left cell of the range of adjacent cells that you want to merge, and then click Paste on the Standard toolbar.

Select the cells that you want to merge.

NOTE The cells that you select must be adjacent.

On the Formatting toolbar, click Merge and Center .

The cells will be merged in a row or column, and the cell contents will be centered in the merged cell.

NOTE If the Merge and Center button is unavailable, the selected cell may be in editing mode. To cancel editing mode, press ENTER.

To change the text alignment in the merged cell, select the cell, and then click Align Left or Align Right on the Formatting toolbar.

TOP OF PAGE

Split merged cells

You can split only cells that were previously merged.

Select the merged cell.

When you select a merged cell, the Merge and Center button Button image

also appears selected on the Formattingtoolbar.

To unmerge cells, click Merge and Center .

NOTE When the merged cell is split, the contents of the merged cell will appear in the upper-left cell of the range of split cells.

TOP OF PAGE

Merge the contents of multiple cells into one cell

You can use a formula with the ampersand (&) operator to combine text from multiple cells into one cell.

Select the cell in which you want to combine the contents of other cells.
To start the formula, type =(
Select the first cell that contains the text that you want to combine, type &" "& (with a space between the quotation marks), and then select the next cell that contains the text that you want to combine.

To combine the contents of more than two cells, continue selecting cells, making sure to type &" "& between selections. If you don't want to add a space between combined text, type & instead of &" "&. To insert a comma, type &", "& (with a comma followed by a space between the quotation marks).

To finalize the formula, type )
To see the results of the formula, press ENTER.

Tip You can also use the CONCATENATE function to combine text from multiple cells into one cell.

Example

The following example worksheet shows the available formulas that you can use. The example may be easier to understand if you copy it to a blank worksheet.

Copy the example to a blank worksheet

1

2
3

A	B
First Name	Last Name
Nancy	Davolio
Andrew	Fuller
Formula	Description (Result)
=A2&" "&B2	Combines the names above, separated by a space (Nancy Davolio)
=B3&", "&A3	Combines the names above, separated by a comma (Fuller, Andrew)
=CONCATENATE(A2," ",B2)	Combines the names above, separated by a space (Nancy Davolio)

NOTE The formula inserts a space between the first and last names by using a space enclosed within quotation marks. Use quotation marks to include any literal text — text that does not change — in the result.

TOP OF PAGE

Split the contents of cells across multiple cells

You cannot split a cell or range of cells that was not previously merged. You can, however, divide the contents of unmerged cells and display them across other cells.

Select the cell, the range of cells, or the entire column that contains the text values that you want to divide across other cells. A range can be any number of rows tall, but no more than one column wide.

IMPORTANT Unless there are one or more blank columns to the right of the selected column, the data to the right of the selected column will be overwritten.

On the Data menu, click Text to Columns.
Follow the instructions in the Convert Text to Columns Wizard to specify how you want to divide the text into columns.