bioinformatics - Developer IT

How do programmer-seeking employers see Bioinformatics degree?

- by Max

I love programming, but I also love biology. Basically Bioinformatics sounds fun to me. However, there is a fat chance that I won't get a Bioinformatics job and will be forced to build my career around regular programming. Therefore a question: does it matter (much) for an employer if he is looking for a regular programmer but finds a Bioinformatics diploma? Or is it the same in the long run as a regular Informatics diploma?

Read the article

What do you think is the best language for Bioinformatics?

- by Ben Fossen

I have done a couple research jobs in Bio-informatics and I have used Matlab for them. Matlab had a lot of powerful tools and was easy to use. I did thinks with genome sequencing and predicting metabolic pathways. I am wondering what other people think is best? or there might not be one specific language but a few that lend themselves best to Bio-informatics work that is math heavy and deals with a large amount of data.

Read the article

Bored with CS. What can I study that will make an impact?

- by Eric Martinez

So I'm going to an average university, majoring in CS. I haven't learned a damn thing and am in my third year. I've come to be really bored with studying CS. Initially, I was kind of misinformed and thought majoring in CS would make me a good "product creator". I make my money combining programming and business/marketing. But I have always done programming for the love of the art. I find things like algorithms interesting and want to be able to use CS to solve real world problems. I like the idea of bioinformatics and other hybrid studies. I'm not good enough, yet I hope, to really make a significant effort in those areas but I aspire to be. What are some other fields, open problems, and otherwise cool stuff to apply CS knowledge to in the real world? I'm really looking for something that will motivate me and re-excite me to continue studying CS. Edit: Like someone mentioned below. I am very interesting in the idea of being able to use computer science to help answer fundamental questions of life and the universe. But I'm not sure what is possible or how to begin.

Read the article

Why is Perl used so extensively in Biology research?

- by Kevin

I work as support staff in a Biology research institute as a student, and Perl seems to be used everywhere. Not for every single project, but it seems that more than half the people here have a few Perl books in/on their office/desk. Why is Perl used so much in Biology?

Read the article

A python random function acts differently when assigned to a list or called directly...

- by Dror Hilman

I have a python function that randomize a dictionary representing a position specific scoring matrix. for example: mat = { 'A' : [ 0.53, 0.66, 0.67, 0.05, 0.01, 0.86, 0.03, 0.97, 0.33, 0.41, 0.26 ] 'C' : [ 0.14, 0.04, 0.13, 0.92, 0.99, 0.04, 0.94, 0.00, 0.07, 0.23, 0.35 ] 'T' : [ 0.25, 0.07, 0.01, 0.01, 0.00, 0.04, 0.00, 0.03, 0.06, 0.12, 0.14 ] 'G' : [ 0.08, 0.23, 0.20, 0.02, 0.00, 0.06, 0.04, 0.00, 0.54, 0.24, 0.25 ] } The scambling function: def scramble_matrix(matrix, iterations): mat_len = len(matrix["A"]) pos1 = pos2 = 0 for count in range(iterations): pos1,pos2 = random.sample(range(mat_len), 2) #suffle the matrix: for nuc in matrix.keys(): matrix[nuc][pos1],matrix[nuc][pos2] = matrix[nuc][pos2],matrix[nuc][pos1] return matrix def print_matrix(matrix): for nuc in matrix.keys(): print nuc+"[", for count in matrix[nuc]: print "%.2f"%count, print "]" now to the problem... When I try to scramble a matrix directly, It's works fine: print_matrix(mat) print "" print_matrix(scramble_matrix(mat,10)) gives: A[ 0.53 0.66 0.67 0.05 0.01 0.86 0.03 0.97 0.33 0.41 0.26 ] C[ 0.14 0.04 0.13 0.92 0.99 0.04 0.94 0.00 0.07 0.23 0.35 ] T[ 0.25 0.07 0.01 0.01 0.00 0.04 0.00 0.03 0.06 0.12 0.14 ] G[ 0.08 0.23 0.20 0.02 0.00 0.06 0.04 0.00 0.54 0.24 0.25 ] A[ 0.41 0.97 0.03 0.86 0.53 0.66 0.33.05 0.67 0.26 0.01 ] C[ 0.23 0.00 0.94 0.04 0.14 0.04 0.07 0.92 0.13 0.35 0.99 ] T[ 0.12 0.03 0.00 0.04 0.25 0.07 0.06 0.01 0.01 0.14 0.00 ] G[ 0.24 0.00 0.04 0.06 0.08 0.23 0.54 0.02 0.20 0.25 0.00 ] but when I try to assign this scrambling to a list , it does not work!!! ... print_matrix(mat) s=[] for x in range(3): s.append(scramble_matrix(mat,10)) for matrix in s: print "" print_matrix(matrix) result: A[ 0.53 0.66 0.67 0.05 0.01 0.86 0.03 0.97 0.33 0.41 0.26 ] C[ 0.14 0.04 0.13 0.92 0.99 0.04 0.94 0.00 0.07 0.23 0.35 ] T[ 0.25 0.07 0.01 0.01 0.00 0.04 0.00 0.03 0.06 0.12 0.14 ] G[ 0.08 0.23 0.20 0.02 0.00 0.06 0.04 0.00 0.54 0.24 0.25 ] A[ 0.01 0.66 0.97 0.67 0.03 0.05 0.33 0.53 0.26 0.41 0.86 ] C[ 0.99 0.04 0.00 0.13 0.94 0.92 0.07 0.14 0.35 0.23 0.04 ] T[ 0.00 0.07 0.03 0.01 0.00 0.01 0.06 0.25 0.14 0.12 0.04 ] G[ 0.00 0.23 0.00 0.20 0.04 0.02 0.54 0.08 0.25 0.24 0.06 ] A[ 0.01 0.66 0.97 0.67 0.03 0.05 0.33 0.53 0.26 0.41 0.86 ] C[ 0.99 0.04 0.00 0.13 0.94 0.92 0.07 0.14 0.35 0.23 0.04 ] T[ 0.00 0.07 0.03 0.01 0.00 0.01 0.06 0.25 0.14 0.12 0.04 ] G[ 0.00 0.23 0.00 0.20 0.04 0.02 0.54 0.08 0.25 0.24 0.06 ] A[ 0.01 0.66 0.97 0.67 0.03 0.05 0.33 0.53 0.26 0.41 0.86 ] C[ 0.99 0.04 0.00 0.13 0.94 0.92 0.07 0.14 0.35 0.23 0.04 ] T[ 0.00 0.07 0.03 0.01 0.00 0.01 0.06 0.25 0.14 0.12 0.04 ] G[ 0.00 0.23 0.00 0.20 0.04 0.02 0.54 0.08 0.25 0.24 0.06 ] What is the problem??? Why the scrambling do not work after the first time, and all the list filled with the same matrix?!

Read the article

Is there a Boost (or other common lib) type for matrices with string keys?

- by mohawkjohn

I have a dense matrix where the indices correspond to genes. While gene identifiers are often integers, they are not contiguous integers. They could be strings instead, too. I suppose I could use a boost sparse matrix of some sort with integer keys, and it wouldn't matter if they're contiguous. Or would this still occupy a great deal of space, particularly if some genes have identifiers that are nine digits? Further, I am concerned that sparse storage is not appropriate, since this is an all-by-all matrix (there will be a distance in each and every cell, provided the gene exists). I'm unlikely to need to perform any matrix operations (e.g., matrix multiplication). I will need to pull vectors out of the matrix (slices). It seems like the best type of matrix would be keyed by a Boost unordered_map (a hash map), or perhaps even simply an STL map. Am I looking at this the wrong way? Do I really need to roll my own? I thought I saw such a class somewhere before. Thanks!

Read the article

Refining data stored in SQLite - how to join several contacts?

- by Krab

Problem background Imagine this problem. You have a water molecule which is in contact with other molecules (if the contact is a hydrogen bond, there can be 4 other molecules around my water). Like in the following picture (A, B, C, D are some other atoms and dots mean the contact). A B . . O / \ H H . . C D I have the information about all the dots and I need to eliminate the water in the center and create records describing contacts of A-C, A-D, A-B, B-C, B-D, and C-D. Database structure Currently, I have the following structure in the database: Table atoms: "id" integer PRIMARY KEY, "amino" char(3) NOT NULL, (HOH for water or other value) other columns identifying the atom Table contacts: "acceptor_id" integer NOT NULL, (the atom near to my hydrogen, here C or D) "donor_id" integer NOT NULL, (here A or B) "directness" char(1) NOT NULL, (this should be D for direct and W for water-mediated) other columns about the contact, such as the distance Current solution (insufficient) Now, I'm going through all the contacts which have donor.amino = "HOH". In this sample case, this would select contacts from C and D. For each of these selected contacts, I look up contacts having the same acceptor_id as is the donor_id in the currently selected contact. From this information, I create the new contact. At the end, I delete all contacts to or from HOH. This way, I am obviously unable to create C-D and A-B contacts (the other 4 are OK). If I try a similar approach - trying to find two contacts having the same donor_id, I end up with duplicate contacts (C-D and D-C). Is there a simple way to retrieve all six contacts without duplicates? I'm dreaming about some one page long SQL query which retrievs just these six wanted rows. :-) It is preferable to conserve information about who is donor where possible, but not strictly necessary. Big thanks to all of you who read this question to this point.

Read the article

How can I apply a PSSM efficiently?

- by flies

I am fitting for position specific scoring matrices (PSSM aka Position Specific Weight Matrices). The fit I'm using is like simulated annealing, where I the perturb the PSSM, compare the prediction to experiment and accept the change if it improves agreement. This means I apply the PSSM millions of times per fit; performance is critical. In my particular problem, I'm applying a PSSM for an object of length L (~8 bp) at every position of a DNA sequence of length M (~30 bp) (so there are M-L+1 valid positions). I need an efficient algorithm to apply a PSSM. Can anyone help improve performance? My best idea is to convert the DNA into some kind of a matrix so that applying the PSSM is matrix multiplication. There are efficient linear algebra libraries out there (e.g. BLAS), but I'm not sure how best to turn an M-length DNA sequence into a matrix M x 4 matrix and then apply the PSSM at each position. The solution needs to work for higher order/dinucleotide terms in the PSSM - presumably this means representing the sequence-matrix for mono-nucleotides and separately for dinucleotides. My current solution iterates over each position m, then over each letter in word from m to m+L-1, adding the corresponding term in the matrix. I'm storing the matrix as a multi-dimensional STL vector, and profiling has revealed that a lot of the computation time is just accessing the elements of the PSSM (with similar performance bottlenecks accessing the DNA sequence). If someone has an idea besides matrix multiplication, I'm all ears.

Read the article

How can I save BioPerl sequence nested features in genbank or embl format?

- by Ryan Thompson

In BioPerl, a sequence object can have any number of features, and each of these can have subfeatures nested within them. For example, a feature may be a complete coding sequence of a gene, and its subfeatures might be individual exons that are concatenated to form the full coding sequence. However, when I use BioPerl to write a sequence object to a file in genbank or embl format, only the top-level features are written to the file, not the sub-features nested within the top-level features. How can I store my subfeatures in sequence files? Should I just convert all my subfeatures into top-level features, and then reconstruct the tree structure next time I read in the sequence?

Read the article

A graph-based tuple merge?

- by user1644030

I have paired values in tuples that are related matches (and technically still in CSV files). Neither of the paired values are necessarily unique. tupleAB = (A####, B###), (A###, B###), (A###, B###)... tupleBC = (B####, C###), (B###, C###), (B###, C###)... tupleAC = (A####, C###), (A###, C###), (A###, C###)... My ideal output would be a dictionary with a unique ID and a list of "reinforced" matches. The way I try to think about it is in a graph-based context. For example, if: tupleAB[x] = (A0001, B0012) tupleBC[y] = (B0012, C0230) tupleAC[z] = (A0001, C0230) This would produce: output = {uniquekey0001, [A0001, B0012, C0230]} Ideally, this would also be able to scale up to more than three tuples (for example, adding a "D" match that would result in an additional three tuples - AD, BD, and CD - and lists of four items long; and so forth). In regards to scaling up to more tuples, I am open to having "graphs" that aren't necessarily fully connected, i.e., every node connected to every other node. My hunch is that I could easily filter based on the list lengths. I am open to any suggestions. I think, with a few cups of coffee, I could work out a brute force solution, but I thought I'd ask the community if anyone was aware of a more elegant solution. Thanks for any feedback.

Read the article

Faster way to split a string and count characters using R?

- by chrisamiller

I'm looking for a faster way to calculate GC content for DNA strings read in from a FASTA file. This boils down to taking a string and counting the number of times that the letter 'G' or 'C' appears. I also want to specify the range of characters to consider. I have a working function that is fairly slow, and it's causing a bottleneck in my code. It looks like this: ## ## count the number of GCs in the characters between start and stop ## gcCount <- function(line, st, sp){ chars = strsplit(as.character(line),"")[[1]] numGC = 0 for(j in st:sp){ ##nested ifs faster than an OR (|) construction if(chars[[j]] == "g"){ numGC <- numGC + 1 }else if(chars[[j]] == "G"){ numGC <- numGC + 1 }else if(chars[[j]] == "c"){ numGC <- numGC + 1 }else if(chars[[j]] == "C"){ numGC <- numGC + 1 } } return(numGC) } Running Rprof gives me the following output: > a = "GCCCAAAATTTTCCGGatttaagcagacataaattcgagg" > Rprof(filename="Rprof.out") > for(i in 1:500000){gcCount(a,1,40)}; > Rprof(NULL) > summaryRprof(filename="Rprof.out") self.time self.pct total.time total.pct "gcCount" 77.36 76.8 100.74 100.0 "==" 18.30 18.2 18.30 18.2 "strsplit" 3.58 3.6 3.64 3.6 "+" 1.14 1.1 1.14 1.1 ":" 0.30 0.3 0.30 0.3 "as.logical" 0.04 0.0 0.04 0.0 "as.character" 0.02 0.0 0.02 0.0 $by.total total.time total.pct self.time self.pct "gcCount" 100.74 100.0 77.36 76.8 "==" 18.30 18.2 18.30 18.2 "strsplit" 3.64 3.6 3.58 3.6 "+" 1.14 1.1 1.14 1.1 ":" 0.30 0.3 0.30 0.3 "as.logical" 0.04 0.0 0.04 0.0 "as.character" 0.02 0.0 0.02 0.0 $sampling.time [1] 100.74 Any advice for making this code faster?

Read the article

Efficient file buffering & scanning methods for large files in python

- by eblume

The description of the problem I am having is a bit complicated, and I will err on the side of providing more complete information. For the impatient, here is the briefest way I can summarize it: What is the fastest (least execution time) way to split a text file in to ALL (overlapping) substrings of size N (bound N, eg 36) while throwing out newline characters. I am writing a module which parses files in the FASTA ascii-based genome format. These files comprise what is known as the 'hg18' human reference genome, which you can download from the UCSC genome browser (go slugs!) if you like. As you will notice, the genome files are composed of chr[1..22].fa and chr[XY].fa, as well as a set of other small files which are not used in this module. Several modules already exist for parsing FASTA files, such as BioPython's SeqIO. (Sorry, I'd post a link, but I don't have the points to do so yet.) Unfortunately, every module I've been able to find doesn't do the specific operation I am trying to do. My module needs to split the genome data ('CAGTACGTCAGACTATACGGAGCTA' could be a line, for instance) in to every single overlapping N-length substring. Let me give an example using a very small file (the actual chromosome files are between 355 and 20 million characters long) and N=8 import cStringIO example_file = cStringIO.StringIO("""\ header CAGTcag TFgcACF """) for read in parse(example_file): ... print read ... CAGTCAGTF AGTCAGTFG GTCAGTFGC TCAGTFGCA CAGTFGCAC AGTFGCACF The function that I found had the absolute best performance from the methods I could think of is this: def parse(file): size = 8 # of course in my code this is a function argument file.readline() # skip past the header buffer = '' for line in file: buffer += line.rstrip().upper() while len(buffer) = size: yield buffer[:size] buffer = buffer[1:] This works, but unfortunately it still takes about 1.5 hours (see note below) to parse the human genome this way. Perhaps this is the very best I am going to see with this method (a complete code refactor might be in order, but I'd like to avoid it as this approach has some very specific advantages in other areas of the code), but I thought I would turn this over to the community. Thanks! Note, this time includes a lot of extra calculation, such as computing the opposing strand read and doing hashtable lookups on a hash of approximately 5G in size. Post-answer conclusion: It turns out that using fileobj.read() and then manipulating the resulting string (string.replace(), etc.) took relatively little time and memory compared to the remainder of the program, and so I used that approach. Thanks everyone!

Read the article

How to generate all strings with d-mismatches, python

- by mr.M

I have a following string - "AACCGGTTT" (alphabet is ["A","G","C","T"]). I would like to generate all strings that differ from the original in any two positions i.e. GAGCGGTTT ^ ^ TATCGGTTT ^ ^ How can I do it in Python? I have only brute force solution (it is working): generate all strings on a given alphabet with the same length append strings that have 2 mismatches with a given string However, could you suggest more efficient way to do so?

Read the article

Parse large XML file w/ script or use BioPython API ?

- by jeremy04

Hey guys this is my first question on here. I'm trying to make a local copy of the UniprotKB in SQL. The UniprotKB is 2.1GB, and it comes in XML and a special text format used by SwissProt Here are my options: 1) Use a SAX parser (XML) - I chose Ruby, and Nokogiri. I started writing the parser, but my initial reaction: how would I map the XML schema to the SAX parser? 2) BioPython - I already have BioSQL/Biopython installed, which literally created my SQL schema for me, and I was able to successfully insert one SwissProt/Uniprot txt file into the database. I'm running it right now (crosses fingers) on the entire 2.1gb. Here is the code I'm running: from Bio import SeqIO from BioSQL import BioSeqDatabase from Bio import SwissProt server = BioSeqDatabase.open_database(driver = "MySQLdb", user = "root", passwd = "", host="localhost", db = "bioseqdb") db = server["uniprot"] iterator = SeqIO.parse(open("/path/to/uniprot_sprot.dat", "r"), "swiss") db.load(iterator) server.commit() Edit: it's now crashing because the transactions are getting locked (since the tables are Innodb) Error Number: 1205 Lock wait timeout exceeded; try restarting transaction. I'm using MySQL version: 5.1.43 Should I switch my database to Postgrelsql ?

Read the article

R optimization: How can I avoid a for loop in this situation?

- by chrisamiller

I'm trying to do a simple genomic track intersection in R, and running into major performance problems, probably related to my use of for loops. In this situation, I have pre-defined windows at intervals of 100bp and I'm trying to calculate how much of each window is covered by the annotations in mylist. Graphically, it looks something like this: 0 100 200 300 400 500 600 windows: |-----|-----|-----|-----|-----|-----| mylist: |-| |-----------| So I wrote some code to do just that, but it's fairly slow and has become a bottleneck in my code: ##window for each 100-bp segment windows <- numeric(6) ##second track mylist = vector("list") mylist[[1]] = c(1,20) mylist[[2]] = c(120,320) ##do the intersection for(i in 1:length(mylist)){ st <- floor(mylist[[i]][1]/100)+1 sp <- floor(mylist[[i]][2]/100)+1 for(j in st:sp){ b <- max((j-1)*100, mylist[[i]][1]) e <- min(j*100, mylist[[i]][2]) windows[j] <- windows[j] + e - b + 1 } } print(windows) [1] 20 81 101 21 0 0 Naturally, this is being used on data sets that are much larger than the example I provide here. Through some profiling, I can see that the bottleneck is in the for loops, but my clumsy attempt to vectorize it using *apply functions resulted in code that runs an order of magnitude more slowly. I suppose I could write something in C, but I'd like to avoid that if possible. Can anyone suggest another approach that will speed this calculation up?

Read the article

Technologies used in EMBL

- by Sergej Andrejev

My fried suggest I try to apply for a job at EMBL. I'm not bioinformatic in any way, but my friend (who by the way is a biologist working at EMBL) insists that I could adapt to the new environment as long as I have a interest in subject and am generally good at learning new things. But here is a catch. For the last 4 years I've been working with .Net and other Microsoft technologies which I enjoy even more lately. Now, from googling I couldn't find whether it will be possible for me to stick with .Net because it was all perl, java, linux and so on. Is there anybody who could prove whether there is at least minor opportunity for a .Net developer to at least partly develop with C#?

Read the article

Java looping through array - Optimization

- by oudouz

I've got some Java code that runs quite the expected way, but it's taking some amount of time -some seconds- even if the job is just looping through an array. The input file is a Fasta file as shown in the image below. The file I'm using is 2.9Mo, and there are some other Fasta file that can take up to 20Mo. And in the code im trying to loop through it by bunches of threes, e.g: AGC TTT TCA ... etc The code has no functional sens for now but what I want is to append each Amino Acid to it's equivalent bunch of Bases. Example : AGC - Ser / CUG Leu / ... etc So what's wrong with the code ? and Is there any way to do it better ? Any optimization ? Looping through the whole String is taking some time, maybe just seconds, but need to find a better way to do it. import java.io.BufferedReader; import java.io.File; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; public class fasta { public static void main(String[] args) throws IOException { File fastaFile; FileReader fastaReader; BufferedReader fastaBuffer = null; StringBuilder fastaString = new StringBuilder(); try { fastaFile = new File("res/NC_017108.fna"); fastaReader = new FileReader(fastaFile); fastaBuffer = new BufferedReader(fastaReader); String fastaDescription = fastaBuffer.readLine(); String line = fastaBuffer.readLine(); while (line != null) { fastaString.append(line); line = fastaBuffer.readLine(); } System.out.println(fastaDescription); System.out.println(); String currentFastaAcid; for (int i = 0; i < fastaString.length(); i+=3) { currentFastaAcid = fastaString.toString().substring(i, i + 3); System.out.println(currentFastaAcid); } } catch (NullPointerException e) { System.out.println(e.getMessage()); } catch (FileNotFoundException e) { System.out.println(e.getMessage()); } catch (IOException e) { System.out.println(e.getMessage()); } finally { fastaBuffer.close(); } } }

Read the article

How should we serve files in a small bioinformatics cluster?

- by cespinoza

We have a small cluster of six ubuntu servers. We run bioinformatics analyses on these clusters. Each analysis takes about 24 hours to complete, each core i7 server can handle 2 at a time, takes as input about 5GB data and outputs about 10-25GB of data. We run dozens of these a week. The software is a hodgepodge of custom perl scripts and 3rd party sequence alignment software written in C/C++. Currently, files are served from two of the compute nodes (yes, we're using compute nodes as file servers)-- each node has 5 1TB sata drives mounted separately (no raid) and is pooled via glusterfs 2.0.1. They each have as 3 bonded intel ethernet pci gigabit ethernet cards, attached to a d-link DGS-1224T switch ($300 24 port consumer-level). We are not currently using jumbo frames (not sure why, actually). The two file-serving compute nodes are then mirrored via glusterfs. Each of the four other nodes mounts the files via glusterfs. The files are all large (4gb+), and are stored as bare files (no database/etc) if that matters. As you can imagine, this is a bit of a mess that grew organically without forethought and we want to improve it now that we're running out of space. Our analyses are I/O intensive and it is a bottle neck-- we're only getting 140mB/sec between the two fileservers, maybe 50mb/sec from the clients (which only have single NICs). We have a flexible budget which I can probably get up $5k or so. How should we spend our budget? We need at least 10TB of storage fast enough to serve all nodes. How fast/big does the cpu/memory of such a file server have to be? Should we use NFS, ATA over Ethernet, iSCSI, Glusterfs, or something else? Should we buy two or more servers and create some sort of storage cluster, or is 1 server enough for such a small number of nodes? Should we invest in faster NICs (say, PCI-express cards with multiple connectors)? The switch? Should we use raid, if so, hardware or software? and which raid (5, 6, 10, etc)? Any ideas appreciated. We're biologists, not IT gurus.

Read the article

Tissue Specific Electrochemical Fingerprinting on the NetBeans Platform

- by Geertjan

Proteomics and metalloproteomics are rapidly developing interdisciplinary fields providing enormous amounts of data to be classified, evaluated, and interpreted. Approaches offered by bioinformatics and also by biostatistical data analysis and treatment are therefore becoming increasingly relevant. A bioinformatics tool has been developed at universities in Prague and Brno, in the Czech Republic, for analysis and visualization in this domain, on the NetBeans Platform: More info: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0049654

Read the article

from MS Biology to BS Computer Science [on hold]

- by Air Borne

I'm Marco from Italy and I'd like to ask you a piece of advice about my career. I hold a Ms degree in Biology, I enjoyed a lot studying it and I got very good grades but I didn't know what to do with my degree in the real life. Few months ago, I began to read a book about Python programming (Introduction to Computer Science, Zelle J.) and I've great fun learning Python as a beginner, I wake up in the morning thinking about doing excersies and writing simple programs with python :) I'm also watching free lectures from MIT open courseware, and I'm feeling a certain degree of regrets for never asking myself what was computer science, since it seems to me it's a magic world. After weeks of doubts, I made a move :) I applied for a CS bachelor degree abroad, I got an interview and I'm going to start this great adventure next September. I feel incredibly excited at it, but a little bit scared too. Scared because sometimes I think I'm making a great mistake for my life restarting from a bachelor in a completely different area of study. Sometimes I hear people saying the IT market is bad, sometimes I hear other ones saying quite the opposite instead. Moreover, some colleagues of mine suggested me to try to get into Bioinformatics, instead of CS. My question is: I want to really discover if CS is for me, I mean the passion of my life. I know I'm just a beginner and I can't say nothing about it yet. What do you suggest me: CS or Bioinformatics? If I get a Bs in CS, could I get into bioinformatics without relevant experience, taking into account I have a Ms Biology degree? Any comment is appreciated, thanks in advance.

Read the article

mpiBLAST: Open-Source Parallel BLAST

Virginia Tech high-performance computing bioinformatics project reveals missing genes High-performance computing - Parallel computing - United States - Education - Virginia Tech

Read the article

Mandriva Linux is a fantastic scientific platform

<b>Mandriva Blog: </b>"I'm now a researcher in bioinformatics working as a post-doc on bone cancer. During the past ten years Mandriva has proven rock solid on all the installations I had to perform for my own usage and for my colleagues."

Read the article

best way to go about cost-benefit analysis on hardware

- by Michael

I'm looking to build a low-end computational server (my jargon in this field is especially limited so if someone can state that better please change that to meet jargon). I'm basically running computational fluid dynamics programs, large matrix computations and bioinformatics code. What would be the best way to approach cost/benefit analysis on what to put in the system? Perhaps even more general: How does one approach cost/benefit analysis on hardware theoretically (doing the analysis before building the machine)?

Read the article

How to recover unavailable memory in /dev/shm

- by Alain Labbe

Good day to all, I have a question regarding the use of /dev/shm. I use it as a temporary folder for large files to speed up processing and save IO off the HD. My problem is that some of my scripts sometimes require "forceful" interruption for a variety of reasons. I can then manually remove the files left over in /dev/shm but the memory is not returned to available space (as seen by df -h). Is there any way to recover the memory without restarting the system? I'm using LTS12.04 and most of the scripts are PERL running system call on C programs (bioinformatics tools). Thanks.

Search Results

Search found 37 results on 2 pages for 'bioinformatics'.

Page 1/2 | 1 2 | Next Page >

- by Max

- by Matt

- by Ben Fossen

- by Eric Martinez

- by Kevin

- by Dror Hilman

- by mohawkjohn

- by Krab

- by flies

- by Ryan Thompson

- by user1644030

- by chrisamiller

- by eblume

- by mr.M

- by jeremy04

- by chrisamiller

- by Sergej Andrejev

- by oudouz

- by cespinoza

- by Geertjan

- by Air Borne

- by Michael

- by Alain Labbe

1 2 | Next Page >