probability - Page 4 - Developer IT

Python - calculate multinomial probability density functions on large dataset?

- by Seafoid

Hi, I originally intended to use MATLAB to tackle this problem but the inbuilt functions has limitations that do not suit my goal. The same limitation occurs in NumPy. I have two tab-delimited files. The first is a file showing amino acid residue, frequency and count for an in-house database of protein structures, i.e. A 0.25 1 S 0.25 1 T 0.25 1 P 0.25 1 The second file consists of quadruplets of amino acids and the number of times they occur, i.e. ASTP 1 Note, there are 8,000 such quadruplets. Based on the background frequency of occurence of each amino acid and the count of quadruplets, I aim to calculate the multinomial probability density function for each quadruplet and subsequently use it as the expected value in a maximum likelihood calculation. The multinomial distribution is as follows: f(x|n, p) = n!/(x1!*x2!*...*xk!)*((p1^x1)*(p2^x2)*...*(pk^xk)) where x is the number of each of k outcomes in n trials with fixed probabilities p. n is 4 four in all cases in my calculation. I have created three functions to calculate this distribution. # functions for multinomial distribution def expected_quadruplets(x, y): expected = x*y return expected # calculates the probabilities of occurence raised to the number of occurrences def prod_prob(p1, a, p2, b, p3, c, p4, d): prob_prod = (pow(p1, a))*(pow(p2, b))*(pow(p3, c))*(pow(p4, d)) return prob_prod # factorial() and multinomial_coefficient() work in tandem to calculate C, the multinomial coefficient def factorial(n): if n <= 1: return 1 return n*factorial(n-1) def multinomial_coefficient(a, b, c, d): n = 24.0 multi_coeff = (n/(factorial(a) * factorial(b) * factorial(c) * factorial(d))) return multi_coeff The problem is how best to structure the data in order to tackle the calculation most efficiently, in a manner that I can read (you guys write some cryptic code :-)) and that will not create an overflow or runtime error. To data my data is represented as nested lists. amino_acids = [['A', '0.25', '1'], ['S', '0.25', '1'], ['T', '0.25', '1'], ['P', '0.25', '1']] quadruplets = [['ASTP', '1']] I initially intended calling these functions within a nested for loop but this resulted in runtime errors or overfloe errors. I know that I can reset the recursion limit but I would rather do this more elegantly. I had the following: for i in quadruplets: quad = i[0].split(' ') for j in amino_acids: for k in quadruplets: for v in k: if j[0] == v: multinomial_coefficient(int(j[2]), int(j[2]), int(j[2]), int(j[2])) I haven'te really gotten to how to incorporate the other functions yet. I think that my current nested list arrangement is sub optimal. I wish to compare the each letter within the string 'ASTP' with the first component of each sub list in amino_acids. Where a match exists, I wish to pass the appropriate numeric values to the functions using indices. Is their a better way? Can I append the appropriate numbers for each amino acid and quadruplet to a temporary data structure within a loop, pass this to the functions and clear it for the next iteration? Thanks, S :-)

Search Results

Search found 309 results on 13 pages for 'probability'.

Page 4/13 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Seafoid

- by krupan

- by RadiantHex

- by abhishek

- by dreeves

- by user67081

- by philcolbourn

- by DmitryN

- by Omega

- by vikram.shukla(at)oracle.com

- by Roflcoptr

- by keruilin

- by Yunzhou

- by Angel Escobedo

- by Tom

- by JoshReuben

- by 9000

- by CaptSaltyJack

- by Brann

- by Gokul

- by Microkernel

- by Microkernel

- by ooboo

- by Microkernel

- by bobo

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >