Search Results

Search found 309 results on 13 pages for 'probability'.

Page 4/13 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Python - calculate multinomial probability density functions on large dataset?

    - by Seafoid
    Hi, I originally intended to use MATLAB to tackle this problem but the inbuilt functions has limitations that do not suit my goal. The same limitation occurs in NumPy. I have two tab-delimited files. The first is a file showing amino acid residue, frequency and count for an in-house database of protein structures, i.e. A 0.25 1 S 0.25 1 T 0.25 1 P 0.25 1 The second file consists of quadruplets of amino acids and the number of times they occur, i.e. ASTP 1 Note, there are 8,000 such quadruplets. Based on the background frequency of occurence of each amino acid and the count of quadruplets, I aim to calculate the multinomial probability density function for each quadruplet and subsequently use it as the expected value in a maximum likelihood calculation. The multinomial distribution is as follows: f(x|n, p) = n!/(x1!*x2!*...*xk!)*((p1^x1)*(p2^x2)*...*(pk^xk)) where x is the number of each of k outcomes in n trials with fixed probabilities p. n is 4 four in all cases in my calculation. I have created three functions to calculate this distribution. # functions for multinomial distribution def expected_quadruplets(x, y): expected = x*y return expected # calculates the probabilities of occurence raised to the number of occurrences def prod_prob(p1, a, p2, b, p3, c, p4, d): prob_prod = (pow(p1, a))*(pow(p2, b))*(pow(p3, c))*(pow(p4, d)) return prob_prod # factorial() and multinomial_coefficient() work in tandem to calculate C, the multinomial coefficient def factorial(n): if n <= 1: return 1 return n*factorial(n-1) def multinomial_coefficient(a, b, c, d): n = 24.0 multi_coeff = (n/(factorial(a) * factorial(b) * factorial(c) * factorial(d))) return multi_coeff The problem is how best to structure the data in order to tackle the calculation most efficiently, in a manner that I can read (you guys write some cryptic code :-)) and that will not create an overflow or runtime error. To data my data is represented as nested lists. amino_acids = [['A', '0.25', '1'], ['S', '0.25', '1'], ['T', '0.25', '1'], ['P', '0.25', '1']] quadruplets = [['ASTP', '1']] I initially intended calling these functions within a nested for loop but this resulted in runtime errors or overfloe errors. I know that I can reset the recursion limit but I would rather do this more elegantly. I had the following: for i in quadruplets: quad = i[0].split(' ') for j in amino_acids: for k in quadruplets: for v in k: if j[0] == v: multinomial_coefficient(int(j[2]), int(j[2]), int(j[2]), int(j[2])) I haven'te really gotten to how to incorporate the other functions yet. I think that my current nested list arrangement is sub optimal. I wish to compare the each letter within the string 'ASTP' with the first component of each sub list in amino_acids. Where a match exists, I wish to pass the appropriate numeric values to the functions using indices. Is their a better way? Can I append the appropriate numbers for each amino acid and quadruplet to a temporary data structure within a loop, pass this to the functions and clear it for the next iteration? Thanks, S :-)

    Read the article

  • Calculating probability that a string has been randomized? - Python

    - by RadiantHex
    Hi folks, this is correlated to a question I asked earlier (question) I have a list of manually created strings such as: lucy87 gordan_king fancy_unicorn77 joplucky_kanga90 base_belong_to_narwhals and a list of randomized strings: johnkdf pancake90kgjd fancy_jagookfk manhattanljg What gives away that the last set of strings are randomized is that sequences such as 'kjg', 'jgf', 'lkd', ... . Any clever way I could separate strings that contain these apparently randomized strings from the crowd? I guess that this plays a lot on the fact that certain characters are more likely to be placed next to others (e.g. 'co', 'ka', 'ja', ...). Any ideas on this one? Kylotan mentioned Reverend, but I am not sure if it can be used fr such purpose. Help would be much appreciated!

    Read the article

  • Web UI for inputting a function from the reals to the reals, such as a probability distribution.

    - by dreeves
    I would like a web interface for a user to describe a one-dimensional real-valued function. I'm imagining the user being presented with a blank pair of axes and they can click anywhere to create points that are thick and draggable. Double-clicking a point, let's say, makes it disappear. The actual function should be shown in real time as an interpolation of the user-supplied points. Here's what this looks like implemented in Mathematica (though of course I'm looking for something in javascript):

    Read the article

  • Tiling Problem Solutions for Various Size "Dominoes"

    - by user67081
    I've got an interesting tiling problem, I have a large square image (size 128k so 131072 squares) with dimensons 256x512... I want to fill this image with certain grain types (a 1x1 tile, a 1x2 strip, a 2x1 strip, and 2x2 square) and have no overlap, no holes, and no extension past the image boundary. Given some probability for each of these grain types, a list of the number required to be placed is generated for each. Obviously an iterative/brute force method doesn't work well here if we just randomly place the pieces, instead a certain algorithm is required. 1) all 2x2 square grains are randomly placed until exhaustion. 2) 1x2 and 2x1 grains are randomly placed alternatively until exhaustion 3) the remaining 1x1 tiles are placed to fill in all holes. It turns out this algorithm works pretty well for some cases and has no problem filling the entire image, however as you might guess, increasing the probability (and thus number) of 1x2 and 2x1 grains eventually causes the placement to stall (since there are too many holes created by the strips and not all them can be placed). My approach to this solution has been as follows: 1) Create a mini-image of size 8x8 or 16x16. 2) Fill this image randomly and following the algorithm specified above so that the desired probability of the entire image is realized in the mini-image. 3) Create N of these mini-images and then randomly successively place them in the large image. Unfortunately there are some downfalls to this simplification. 1) given the small size of the mini-images, nailing an exact probability for the entire image is not possible. Example if I want p(2x1)=P(1x2)=0.4, the mini image may only give 0.41 as the closes probability. 2) The mini-images create a pseudo boundary where no overlaps occur which isn't really descriptive of the model this is being used for. 3) There is only a fixed number of mini-images so i'm not sure how random this really is. I'm really just looking to brainstorm about possible solutions to this. My main concern is really to nail down closer probabilities, now one might suggest I just increase the mini-image size. Well I have, and it turns out that in certain cases(p(1x2)=p(2x1)=0.5) the mini-image 16x16 isn't even iteratively solvable.. So it's pretty obvious how difficult it is to randomly solve this for anything greater than 8x8 sizes.. So I'd love to hear some ideas. Thanks

    Read the article

  • How can I test that my hash function is good in terms of max-load?

    - by philcolbourn
    I have read through various papers on the 'Balls and Bins' problem and it seems that if a hash function is working right (ie. it is effectively a random distribution) then the following should/must be true if I hash n values into a hash table with n slots (or bins): Probability that a bin is empty, for large n is 1/e. Expected number of empty bins is n/e. Probability that a bin has k collisions is <= 1/k!. Probability that a bin has at least k collisions is <= (e/k)**k. These look easy to check. But the max-load test (the maximum number of collisions with high probability) is usually stated vaguely. Most texts state that the maximum number of collisions in any bin is O( ln(n) / ln(ln(n)) ). Some say it is 3*ln(n) / ln(ln(n)). Other papers mix ln and log - usually without defining them, or state that log is log base e and then use ln elsewhere. Is ln the log to base e or 2 and is this max-load formula right and how big should n be to run a test? This lecture seems to cover it best, but I am no mathematician. http://pages.cs.wisc.edu/~shuchi/courses/787-F07/scribe-notes/lecture07.pdf BTW, with high probability seems to mean 1 - 1/n.

    Read the article

  • Which K-factor should be used in case players have different K-factor values

    - by DmitryN
    I am implementing a ranking system based on the Elo rating and cannot get a point about the K-factor. If two players with different skills and, therefore, different K-factors are playing, which exactly K-factor should be used when changing their ratings? For example, player A has rating 2500 and K-factor 16 (probability ~75%) while player B has rating 2300 and K-factor 24 (probability ~25%). If player A wins, do I need to use 16 as K-factor for both players or 16 for player A and 24 for player B?

    Read the article

  • How to discriminate from two nodes with identical frequencies in a Huffman's tree?

    - by Omega
    Still on my quest to compress/decompress files with a Java implementation of Huffman's coding (http://en.wikipedia.org/wiki/Huffman_coding) for a school assignment. From the Wikipedia page, I quote: Create a leaf node for each symbol and add it to the priority queue. While there is more than one node in the queue: Remove the two nodes of highest priority (lowest probability) from the queue Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities. Add the new node to the queue. The remaining node is the root node and the tree is complete. Now, emphasis: Remove the two nodes of highest priority (lowest probability) from the queue Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities. So I have to take two nodes with the lowest frequency. What if there are multiple nodes with the same low frequency? How do I discriminate which one to use? The reason I ask this is because Wikipedia has this image: And I wanted to see if my Huffman's tree was the same. I created a file with the following content: aaaaeeee nnttmmiihhssfffouxprl And this was the result: Doesn't look so bad. But there clearly are some differences when multiple nodes have the same frequency. My questions are the following: What is Wikipedia's image doing to discriminate the nodes with the same frequency? Is my tree wrong? (Is Wikipedia's image method the one and only answer?) I guess there is one specific and strict way to do this, because for our school assignment, files that have been compressed by my program should be able to be decompressed by other classmate's programs - so there must be a "standard" or "unique" way to do it. But I'm a bit lost with that. My code is rather straightforward. It literally just follows Wikipedia's listed steps. The way my code extracts the two nodes with the lowest frequency from the queue is to iterate all nodes and if the current node has a lower frequency than any of the two "smallest" known nodes so far, then it replaces the highest one. Just like that.

    Read the article

  • Extending Oracle CEP with Predictive Analytics

    - by vikram.shukla(at)oracle.com
    Introduction: OCEP is often used as a business rules engine to execute a set of business logic rules via CQL statements, and take decisions based on the outcome of those rules. There are times where configuring rules manually is sufficient because an application needs to deal with only a small and well-defined set of static rules. However, in many situations customers don't want to pre-define such rules for two reasons. First, they are dealing with events with lots of columns and manually crafting such rules for each column or a set of columns and combinations thereof is almost impossible. Second, they are content with probabilistic outcomes and do not care about 100% precision. The former is the case when a user is dealing with data with high dimensionality, the latter when an application can live with "false" positives as they can be discarded after further inspection, say by a Human Task component in a Business Process Management software. The primary goal of this blog post is to show how this can be achieved by combining OCEP with Oracle Data Mining® and leveraging the latter's rich set of algorithms and functionality to do predictive analytics in real time on streaming events. The secondary goal of this post is also to show how OCEP can be extended to invoke any arbitrary external computation in an RDBMS from within CEP. The extensible facility is known as the JDBC cartridge. The rest of the post describes the steps required to achieve this: We use the dataset available at http://blogs.oracle.com/datamining/2010/01/fraud_and_anomaly_detection_made_simple.html to showcase the capabilities. We use it to show how transaction anomalies or fraud can be detected. Building the model: Follow the self-explanatory steps described at the above URL to build the model.  It is very simple - it uses built-in Oracle Data Mining PL/SQL packages to cleanse, normalize and build the model out of the dataset.  You can also use graphical Oracle Data Miner®  to build the models. To summarize, it involves: Specifying which algorithms to use. In this case we use Support Vector Machines as we're trying to find anomalies in highly dimensional dataset.Build model on the data in the table for the algorithms specified. For this example, the table was populated in the scott/tiger schema with appropriate privileges. Configuring the Data Source: This is the first step in building CEP application using such an integration.  Our datasource looks as follows in the server config file.  It is advisable that you use the Visualizer to add it to the running server dynamically, rather than manually edit the file.    <data-source>         <name>DataMining</name>         <data-source-params>             <jndi-names>                 <element>DataMining</element>             </jndi-names>             <global-transactions-protocol>OnePhaseCommit</global-transactions-protocol>         </data-source-params>         <connection-pool-params>             <credential-mapping-enabled></credential-mapping-enabled>             <test-table-name>SQL SELECT 1 from DUAL</test-table-name>             <initial-capacity>1</initial-capacity>             <max-capacity>15</max-capacity>             <capacity-increment>1</capacity-increment>         </connection-pool-params>         <driver-params>             <use-xa-data-source-interface>true</use-xa-data-source-interface>             <driver-name>oracle.jdbc.OracleDriver</driver-name>             <url>jdbc:oracle:thin:@localhost:1522:orcl</url>             <properties>                 <element>                     <value>scott</value>                     <name>user</name>                 </element>                 <element>                     <value>{Salted-3DES}AzFE5dDbO2g=</value>                     <name>password</name>                 </element>                                 <element>                     <name>com.bea.core.datasource.serviceName</name>                     <value>oracle11.2g</value>                 </element>                 <element>                     <name>com.bea.core.datasource.serviceVersion</name>                     <value>11.2.0</value>                 </element>                 <element>                     <name>com.bea.core.datasource.serviceObjectClass</name>                     <value>java.sql.Driver</value>                 </element>             </properties>         </driver-params>     </data-source>   Designing the EPN: The EPN is very simple in this example. We briefly describe each of the components. The adapter ("DataMiningAdapter") reads data from a .csv file and sends it to the CQL processor downstream. The event payload here is same as that of the table in the database (refer to the attached project or do a "desc table-name" from a SQL*PLUS prompt). While this is for convenience in this example, it need not be the case. One can still omit fields in the streaming events, and need not match all columns in the table on which the model was built. Better yet, it does not even need to have the same name as columns in the table, as long as you alias them in the USING clause of the mining function. (Caveat: they still need to draw values from a similar universe or domain, otherwise it constitutes incorrect usage of the model). There are two things in the CQL processor ("DataMiningProc") that make scoring possible on streaming events. 1.      User defined cartridge function Please refer to the OCEP CQL reference manual to find more details about how to define such functions. We include the function below in its entirety for illustration. <?xml version="1.0" encoding="UTF-8"?> <jdbcctxconfig:config     xmlns:jdbcctxconfig="http://www.bea.com/ns/wlevs/config/application"     xmlns:jc="http://www.oracle.com/ns/ocep/config/jdbc">        <jc:jdbc-ctx>         <name>Oracle11gR2</name>         <data-source>DataMining</data-source>               <function name="prediction2">                                 <param name="CQLMONTH" type="char"/>                      <param name="WEEKOFMONTH" type="int"/>                      <param name="DAYOFWEEK" type="char" />                      <param name="MAKE" type="char" />                      <param name="ACCIDENTAREA"   type="char" />                      <param name="DAYOFWEEKCLAIMED"  type="char" />                      <param name="MONTHCLAIMED" type="char" />                      <param name="WEEKOFMONTHCLAIMED" type="int" />                      <param name="SEX" type="char" />                      <param name="MARITALSTATUS"   type="char" />                      <param name="AGE" type="int" />                      <param name="FAULT" type="char" />                      <param name="POLICYTYPE"   type="char" />                      <param name="VEHICLECATEGORY"  type="char" />                      <param name="VEHICLEPRICE" type="char" />                      <param name="FRAUDFOUND" type="int" />                      <param name="POLICYNUMBER" type="int" />                      <param name="REPNUMBER" type="int" />                      <param name="DEDUCTIBLE"   type="int" />                      <param name="DRIVERRATING"  type="int" />                      <param name="DAYSPOLICYACCIDENT"   type="char" />                      <param name="DAYSPOLICYCLAIM" type="char" />                      <param name="PASTNUMOFCLAIMS" type="char" />                      <param name="AGEOFVEHICLES" type="char" />                      <param name="AGEOFPOLICYHOLDER" type="char" />                      <param name="POLICEREPORTFILED" type="char" />                      <param name="WITNESSPRESNT" type="char" />                      <param name="AGENTTYPE" type="char" />                      <param name="NUMOFSUPP" type="char" />                      <param name="ADDRCHGCLAIM"   type="char" />                      <param name="NUMOFCARS" type="char" />                      <param name="CQLYEAR" type="int" />                      <param name="BASEPOLICY" type="char" />                                     <return-component-type>char</return-component-type>                                                      <sql><![CDATA[             SELECT to_char(PREDICTION_PROBABILITY(CLAIMSMODEL, '0' USING *))               AS probability             FROM (SELECT  :CQLMONTH AS MONTH,                                            :WEEKOFMONTH AS WEEKOFMONTH,                          :DAYOFWEEK AS DAYOFWEEK,                           :MAKE AS MAKE,                           :ACCIDENTAREA AS ACCIDENTAREA,                           :DAYOFWEEKCLAIMED AS DAYOFWEEKCLAIMED,                           :MONTHCLAIMED AS MONTHCLAIMED,                           :WEEKOFMONTHCLAIMED,                             :SEX AS SEX,                           :MARITALSTATUS AS MARITALSTATUS,                            :AGE AS AGE,                           :FAULT AS FAULT,                           :POLICYTYPE AS POLICYTYPE,                            :VEHICLECATEGORY AS VEHICLECATEGORY,                           :VEHICLEPRICE AS VEHICLEPRICE,                           :FRAUDFOUND AS FRAUDFOUND,                           :POLICYNUMBER AS POLICYNUMBER,                           :REPNUMBER AS REPNUMBER,                           :DEDUCTIBLE AS DEDUCTIBLE,                            :DRIVERRATING AS DRIVERRATING,                           :DAYSPOLICYACCIDENT AS DAYSPOLICYACCIDENT,                            :DAYSPOLICYCLAIM AS DAYSPOLICYCLAIM,                           :PASTNUMOFCLAIMS AS PASTNUMOFCLAIMS,                           :AGEOFVEHICLES AS AGEOFVEHICLES,                           :AGEOFPOLICYHOLDER AS AGEOFPOLICYHOLDER,                           :POLICEREPORTFILED AS POLICEREPORTFILED,                           :WITNESSPRESNT AS WITNESSPRESENT,                           :AGENTTYPE AS AGENTTYPE,                           :NUMOFSUPP AS NUMOFSUPP,                           :ADDRCHGCLAIM AS ADDRCHGCLAIM,                            :NUMOFCARS AS NUMOFCARS,                           :CQLYEAR AS YEAR,                           :BASEPOLICY AS BASEPOLICY                 FROM dual)                 ]]>         </sql>        </function>     </jc:jdbc-ctx> </jdbcctxconfig:config> 2.      Invoking the function for each event. Once this function is defined, you can invoke it from CQL as follows: <?xml version="1.0" encoding="UTF-8"?> <wlevs:config xmlns:wlevs="http://www.bea.com/ns/wlevs/config/application">   <processor>     <name>DataMiningProc</name>     <rules>        <query id="q1"><![CDATA[                     ISTREAM(SELECT S.CQLMONTH,                                   S.WEEKOFMONTH,                                   S.DAYOFWEEK, S.MAKE,                                   :                                         S.BASEPOLICY,                                    C.F AS probability                                                 FROM                                 StreamDataChannel [NOW] AS S,                                 TABLE(prediction2@Oracle11gR2(S.CQLMONTH,                                      S.WEEKOFMONTH,                                      S.DAYOFWEEK,                                       S.MAKE, ...,                                      S.BASEPOLICY) AS F of char) AS C)                       ]]></query>                 </rules>               </processor>           </wlevs:config>   Finally, the last stage in the EPN prints out the probability of the event being an anomaly. One can also define a threshold in CQL to filter out events that are normal, i.e., below a certain mark as defined by the analyst or designer. Sample Runs: Now let's see how this behaves when events are streamed through CEP. We use only two events for brevity, one normal and other one not. This is one of the "normal" looking events and the probability of it being anomalous is less than 60%. Event is: eventType=DataMiningOutEvent object=q1  time=2904821976256 S.CQLMONTH=Dec, S.WEEKOFMONTH=5, S.DAYOFWEEK=Wednesday, S.MAKE=Honda, S.ACCIDENTAREA=Urban, S.DAYOFWEEKCLAIMED=Tuesday, S.MONTHCLAIMED=Jan, S.WEEKOFMONTHCLAIMED=1, S.SEX=Female, S.MARITALSTATUS=Single, S.AGE=21, S.FAULT=Policy Holder, S.POLICYTYPE=Sport - Liability, S.VEHICLECATEGORY=Sport, S.VEHICLEPRICE=more than 69000, S.FRAUDFOUND=0, S.POLICYNUMBER=1, S.REPNUMBER=12, S.DEDUCTIBLE=300, S.DRIVERRATING=1, S.DAYSPOLICYACCIDENT=more than 30, S.DAYSPOLICYCLAIM=more than 30, S.PASTNUMOFCLAIMS=none, S.AGEOFVEHICLES=3 years, S.AGEOFPOLICYHOLDER=26 to 30, S.POLICEREPORTFILED=No, S.WITNESSPRESENT=No, S.AGENTTYPE=External, S.NUMOFSUPP=none, S.ADDRCHGCLAIM=1 year, S.NUMOFCARS=3 to 4, S.CQLYEAR=1994, S.BASEPOLICY=Liability, probability=.58931702982118561 isTotalOrderGuarantee=true\nAnamoly probability: .58931702982118561 However, the following event is scored as an anomaly with a very high probability of  89%. So there is likely to be something wrong with it. A close look reveals that the value of "deductible" field (10000) is not "normal". What exactly constitutes normal here?. If you run the query on the database to find ALL distinct values for the "deductible" field, it returns the following set: {300, 400, 500, 700} Event is: eventType=DataMiningOutEvent object=q1  time=2598483773496 S.CQLMONTH=Dec, S.WEEKOFMONTH=5, S.DAYOFWEEK=Wednesday, S.MAKE=Honda, S.ACCIDENTAREA=Urban, S.DAYOFWEEKCLAIMED=Tuesday, S.MONTHCLAIMED=Jan, S.WEEKOFMONTHCLAIMED=1, S.SEX=Female, S.MARITALSTATUS=Single, S.AGE=21, S.FAULT=Policy Holder, S.POLICYTYPE=Sport - Liability, S.VEHICLECATEGORY=Sport, S.VEHICLEPRICE=more than 69000, S.FRAUDFOUND=0, S.POLICYNUMBER=1, S.REPNUMBER=12, S.DEDUCTIBLE=10000, S.DRIVERRATING=1, S.DAYSPOLICYACCIDENT=more than 30, S.DAYSPOLICYCLAIM=more than 30, S.PASTNUMOFCLAIMS=none, S.AGEOFVEHICLES=3 years, S.AGEOFPOLICYHOLDER=26 to 30, S.POLICEREPORTFILED=No, S.WITNESSPRESENT=No, S.AGENTTYPE=External, S.NUMOFSUPP=none, S.ADDRCHGCLAIM=1 year, S.NUMOFCARS=3 to 4, S.CQLYEAR=1994, S.BASEPOLICY=Liability, probability=.89171554529576691 isTotalOrderGuarantee=true\nAnamoly probability: .89171554529576691 Conclusion: By way of this example, we show: real-time scoring of events as they flow through CEP leveraging Oracle Data Mining.how CEP applications can invoke complex arbitrary external computations (function shipping) in an RDBMS.

    Read the article

  • Pong: How does the paddle know where the ball will hit?

    - by Roflcoptr
    After implementing Pacman and Snake I'm implementing the next very very classic game: Pong. The implementation is really simple, but I just have one little problem remaining. When one of the paddle (I'm not sure if it is called paddle) is controlled by the computer, I have trouble to position it at the correct position. The ball has a current position, a speed (which for now is constant) and a direction angle. So I could calculate the position where it will hit the side of the computer controlled paddle. And so Icould position the paddle right there. But however in the real game, there is a probability that the computer's paddle will miss the ball. How can I implement this probability? If I only use a probability of lets say 0.5 that the computer's paddle will hit the ball, the problem is solved, but I think it isn't that simple. From the original game I think the probability depends on the distance between the current paddle position and the position the ball will hit the border. Does anybody have any hints how exactly this is calculated?

    Read the article

  • Suggestions for opening the Rails toolbox to design a challenge game?

    - by keruilin
    How would you suggest designing a challenge system as part of a food-eating game so that it's automated as possible? All RoR tools, design patterns and logic are at your disposal (e.g., admin consoles, crontab, arch, etc.). Prize goes to whoever can suggest the simplest and most-automated design! Here are the requirements: User has many challenges. Badge has many challenges. (A unique badge is awarded for each challenge won.) Only one challenge can run at a time. Each challenge has a limited number of days that it runs. For example, one challenge can run 3 days, while another runs 7 days. Challenges can be seasonal. For example, "Eat 13 Pumpkins" only runs during the Fall. New challenges are added to the game on an ongoing basis. For example, a new challenge every week. Each challenge has a certain probability of being selected to run. For example, "Eat 10 Pies" challenge has 10% chance of being selected to run. As each new challenge is added to the database, I want the probabilities of running to change dynamically. I want to avoid the scenario where I'm manually updating a database field just to change the probability from 10% to 5%, for example. Challenges act like Easter eggs. Challenge icons pop-up at different places on the webpage. User is awarded a badge for successfully completing a challenge, but only when it's active. There is some wait time between each challenge. Between 1 and 7 days. Which wait time is random, but the probability of the wait time being short is high and the probability of it being a long wait time is low.

    Read the article

  • C# Monte Carlo Simulation Package Needed

    - by Yunzhou
    I'm relative new to C# and doing a project using Monte Carlo Simulation. Basically my question is the following. I have two uncertain variable inputs, A and B, and they will go through a model and give an output C. So C = f(A,B). I know A's probability distribution (Triangular) and B's probability distribution (Discrete). How can I get the probability distribution of C? What I have done now is that I can generate random numbers based on A's triangular distribution as well as B's discrete distribution. Each pair of randomly generated A and B gives a resultant C. I've run this model 1000 times thus I can get 1000 possible values of C. The difficulty is to get the corresponding probabilities of each value of C. Obviously it's not 1/1000 unless C is uniformly distributed. Is there any Monte Carlo Simulation package/library I can use?

    Read the article

  • Random Number on SQL without using NewID()

    - by Angel Escobedo
    Hello I want to generate a Unique Random number with out using the follow statement : Convert(int, (CHECKSUM(NEWID()))*100000) AS [ITEM] Cause when I use joins clauses on "from" it generates double registers by using NEWID() Im using SQL Server 2000 *PD : When I use Rand() it probably repeat on probability 1 of 100000000 but this is so criticall so it have to be 0% of probability to repeat a random value generated My Query with NewID() and result on SELECT statement is duplicated (x2) My QUery without NewID() and using Rand() on SELECT statement is single (x1) but the probability of repeat the random value generated is uncertainly but exists! Thanks!

    Read the article

  • C# log base 10 and rounding up to nearest power of 10?

    - by Tom
    Hi, if i have a number between 100 and 1000 i want to get the value 3 because 10^3 = 1000. Likewise, if i had a number between 10 and 100 i would want to get the value 2, because 10^2 is 100. Incase you're wondering, its to do with calculating a probability and i always need to divide through by 10^value, to keep the probability between 0 and 1. For example if i calculate 9256, i need to divide through by 10^4, so that i get a probability of 0.92 I'm not sure how to do the rounding up and how to do the base 10, could someone please help?

    Read the article

  • NET Math Libraries

    - by JoshReuben
    NET Mathematical Libraries   .NET Builder for Matlab The MathWorks Inc. - http://www.mathworks.com/products/netbuilder/ MATLAB Builder NE generates MATLAB based .NET and COM components royalty-free deployment creates the components by encrypting MATLAB functions and generating either a .NET or COM wrapper around them. .NET/Link for Mathematica www.wolfram.com a product that 2-way integrates Mathematica and Microsoft's .NET platform call .NET from Mathematica - use arbitrary .NET types directly from the Mathematica language. use and control the Mathematica kernel from a .NET program. turns Mathematica into a scripting shell to leverage the computational services of Mathematica. write custom front ends for Mathematica or use Mathematica as a computational engine for another program comes with full source code. Leverages MathLink - a Wolfram Research's protocol for sending data and commands back and forth between Mathematica and other programs. .NET/Link abstracts the low-level details of the MathLink C API. Extreme Optimization http://www.extremeoptimization.com/ a collection of general-purpose mathematical and statistical classes built for the.NET framework. It combines a math library, a vector and matrix library, and a statistics library in one package. download the trial of version 4.0 to try it out. Multi-core ready - Full support for Task Parallel Library features including cancellation. Broad base of algorithms covering a wide range of numerical techniques, including: linear algebra (BLAS and LAPACK routines), numerical analysis (integration and differentiation), equation solvers. Mathematics leverages parallelism using .NET 4.0's Task Parallel Library. Basic math: Complex numbers, 'special functions' like Gamma and Bessel functions, numerical differentiation. Solving equations: Solve equations in one variable, or solve systems of linear or nonlinear equations. Curve fitting: Linear and nonlinear curve fitting, cubic splines, polynomials, orthogonal polynomials. Optimization: find the minimum or maximum of a function in one or more variables, linear programming and mixed integer programming. Numerical integration: Compute integrals over finite or infinite intervals, over 2D and higher dimensional regions. Integrate systems of ordinary differential equations (ODE's). Fast Fourier Transforms: 1D and 2D FFT's using managed or fast native code (32 and 64 bit) BigInteger, BigRational, and BigFloat: Perform operations with arbitrary precision. Vector and Matrix Library Real and complex vectors and matrices. Single and double precision for elements. Structured matrix types: including triangular, symmetrical and band matrices. Sparse matrices. Matrix factorizations: LU decomposition, QR decomposition, singular value decomposition, Cholesky decomposition, eigenvalue decomposition. Portability and performance: Calculations can be done in 100% managed code, or in hand-optimized processor-specific native code (32 and 64 bit). Statistics Data manipulation: Sort and filter data, process missing values, remove outliers, etc. Supports .NET data binding. Statistical Models: Simple, multiple, nonlinear, logistic, Poisson regression. Generalized Linear Models. One and two-way ANOVA. Hypothesis Tests: 12 14 hypothesis tests, including the z-test, t-test, F-test, runs test, and more advanced tests, such as the Anderson-Darling test for normality, one and two-sample Kolmogorov-Smirnov test, and Levene's test for homogeneity of variances. Multivariate Statistics: K-means cluster analysis, hierarchical cluster analysis, principal component analysis (PCA), multivariate probability distributions. Statistical Distributions: 25 29 continuous and discrete statistical distributions, including uniform, Poisson, normal, lognormal, Weibull and Gumbel (extreme value) distributions. Random numbers: Random variates from any distribution, 4 high-quality random number generators, low discrepancy sequences, shufflers. New in version 4.0 (November, 2010) Support for .NET Framework Version 4.0 and Visual Studio 2010 TPL Parallellized – multicore ready sparse linear program solver - can solve problems with more than 1 million variables. Mixed integer linear programming using a branch and bound algorithm. special functions: hypergeometric, Riemann zeta, elliptic integrals, Frensel functions, Dawson's integral. Full set of window functions for FFT's. Product  Price Update subscription Single Developer License $999  $399  Team License (3 developers) $1999  $799  Department License (8 developers) $3999  $1599  Site License (Unlimited developers in one physical location) $7999  $3199    NMath http://www.centerspace.net .NET math and statistics libraries matrix and vector classes random number generators Fast Fourier Transforms (FFTs) numerical integration linear programming linear regression curve and surface fitting optimization hypothesis tests analysis of variance (ANOVA) probability distributions principal component analysis cluster analysis built on the Intel Math Kernel Library (MKL), which contains highly-optimized, extensively-threaded versions of BLAS (Basic Linear Algebra Subroutines) and LAPACK (Linear Algebra PACKage). Product  Price Update subscription Single Developer License $1295 $388 Team License (5 developers) $5180 $1554   DotNumerics http://www.dotnumerics.com/NumericalLibraries/Default.aspx free DotNumerics is a website dedicated to numerical computing for .NET that includes a C# Numerical Library for .NET containing algorithms for Linear Algebra, Differential Equations and Optimization problems. The Linear Algebra library includes CSLapack, CSBlas and CSEispack, ports from Fortran to C# of LAPACK, BLAS and EISPACK, respectively. Linear Algebra (CSLapack, CSBlas and CSEispack). Systems of linear equations, eigenvalue problems, least-squares solutions of linear systems and singular value problems. Differential Equations. Initial-value problem for nonstiff and stiff ordinary differential equations ODEs (explicit Runge-Kutta, implicit Runge-Kutta, Gear's BDF and Adams-Moulton). Optimization. Unconstrained and bounded constrained optimization of multivariate functions (L-BFGS-B, Truncated Newton and Simplex methods).   Math.NET Numerics http://numerics.mathdotnet.com/ free an open source numerical library - includes special functions, linear algebra, probability models, random numbers, interpolation, integral transforms. A merger of dnAnalytics with Math.NET Iridium in addition to a purely managed implementation will also support native hardware optimization. constants & special functions complex type support real and complex, dense and sparse linear algebra (with LU, QR, eigenvalues, ... decompositions) non-uniform probability distributions, multivariate distributions, sample generation alternative uniform random number generators descriptive statistics, including order statistics various interpolation methods, including barycentric approaches and splines numerical function integration (quadrature) routines integral transforms, like fourier transform (FFT) with arbitrary lengths support, and hartley spectral-space aware sequence manipulation (signal processing) combinatorics, polynomials, quaternions, basic number theory. parallelized where appropriate, to leverage multi-core and multi-processor systems fully managed or (if available) using native libraries (Intel MKL, ACMS, CUDA, FFTW) provides a native facade for F# developers

    Read the article

  • Assignment of roles in communication when sides could try to cheat

    - by 9000
    Assume two nodes in a peer-to-peer network initiating a communication. In this communication, one node has to serve as a "sender", another as a "receiver" (role names are arbitrary here). I'd like the nodes to assert either role with approximately equal probability. That is, in N communications with various other nodes a given node would assume the "sender" role roughly N/2 times. Since there's no third-party arbiter available, nodes should agree on their roles by exchanging messages. The catch is that we can encounter a rogue node which would try to become the "receiver" in most or all cases, and coax the other side to always serve as a "sender". I'm looking for an algorithm to assign roles to sides of communication so that no side could get a predetermined role with high probability. It's OK for the side which is trying to cheat to fail to communicate.

    Read the article

  • How to test email spam scores with amavis?

    - by CaptSaltyJack
    I'd like a way to test a spam message to see its spam scores that SpamAssassin gives it. The SA db files (bayes_toks, etc) reside in /var/lib/amavis/.spamassassin. I've been testing emails by doing this: sudo su amavis -c 'spamassassin -t msgfile' Though this yields some strange results, such as: Content analysis details: (3.7 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% [score: 1.0000] -0.0 NO_RELAYS Informational: message was not relayed via SMTP 0.0 LONG_TERM_PRICE BODY: LONG_TERM_PRICE 0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% [score: 1.0000] -0.0 NO_RECEIVED Informational: message has no Received headers 0.2 is an awfully low scores for BAYES_999! But this is the first time I've used amavis, previously I've always just used spamassassin directly as a content filter in postfix, but apparently running amavis/spamassassin is more efficient. So, with amavis in the picture, how can I run a test on a message to see its spam score breakdown? Another email I ran a test on got this result: 2.0 BAYES_80 BODY: Bayes spam probability is 80 to 95% [score: 0.8487] Doesn't make sense, that BAYES_80 can yield a higher score than BAYES_999. Help!

    Read the article

  • How far should we take the N+N redundancy craziness ?

    - by Brann
    The industry standard when it comes from redundancy is quite high, to say the least. To illustrate my point, here is my current setup (I'm running a financial service). Each server has a RAID array in case something goes wrong on one hard drive .... and in case something goes wrong on the server, it's mirrored by another spare identical server ... and both server cannot go down at the same time, because I've got redundant power, and redundant network connectivity, etc ... and my hosting center itself has dual electricity connections to two different energy providers, and redundant network connectivity, and redundant toilets in case the two security guards (sorry, four) needs to use it at the same time ... and in case something goes wrong anyway (a nuclear nuke? can't think of anything else), I've got another identical hosting facility in another country with the exact same setup. Cost of reputational damage if down = very high Probability of a hardware failure with my setup : <<1% Probability of a hardware failure with a less paranoiac setup : <<1% ASWELL Probability of a software failure in our application code : 1% (if your software is never down because of bugs, then I suggest you doublecheck your reporting/monitoring system is not down. Even SQLServer - which is arguably developed and tested by clever people with a strong methodology - is sometimes down) In other words, I feel like I could host a cheap laptop in my mother's flat, and the human/software problems would still be my higher risk. Of course, there are other things to take into consideration such as : scalability data security the clients expectations that you meet the industry standard But still, hosting two servers in two different data centers (without extra spare servers, nor doubled network equipment apart from the one provided by my hosting facility) would provide me with the scalability and the physical security I need. I feel like we're reaching a point where redundancy is just a communcation tool. Honestly, what's the difference between a 99.999% uptime and a 99.9999% uptime when you know you'll be down 1% of the time because of software bugs ? How far do you push your redundancy crazyness ?

    Read the article

  • C++ Thread level constants

    - by Gokul
    Is there a way by which we can simulate thread level constants in C++? For example, if i have to make a call to template functions, then i need to mention the constants as template level parameters? I can use static const variables for template metaprogramming, but they are process level constants. I know, i am asking a question with a high probability of 'No'. Just thought of asking this to capitalize on the very rare probability :)) Thanks, Gokul.

    Read the article

  • Problem with Precision floating point operation in C

    - by Microkernel
    Hi Guys, For one of my course project I started implementing "Naive Bayesian classifier" in C. My project is to implement a document classifier application (especially Spam) using huge training data. Now I have problem implementing the algorithm because of the limitations in the C's datatype. ( Algorithm I am using is given here, http://en.wikipedia.org/wiki/Bayesian_spam_filtering ) PROBLEM STATEMENT: The algorithm involves taking each word in a document and calculating probability of it being spam word. If p1, p2 p3 .... pn are probabilities of word-1, 2, 3 ... n. The probability of doc being spam or not is calculated using Here, probability value can be very easily around 0.01. So even if I use datatype "double" my calculation will go for a toss. To confirm this I wrote a sample code given below. #define PROBABILITY_OF_UNLIKELY_SPAM_WORD (0.01) #define PROBABILITY_OF_MOSTLY_SPAM_WORD (0.99) int main() { int index; long double numerator = 1.0; long double denom1 = 1.0, denom2 = 1.0; long double doc_spam_prob; /* Simulating FEW unlikely spam words */ for(index = 0; index < 162; index++) { numerator = numerator*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD; denom2 = denom2*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD; denom1 = denom1*(long double)(1 - PROBABILITY_OF_UNLIKELY_SPAM_WORD); } /* Simulating lot of mostly definite spam words */ for (index = 0; index < 1000; index++) { numerator = numerator*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD; denom2 = denom2*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD; denom1 = denom1*(long double)(1- PROBABILITY_OF_MOSTLY_SPAM_WORD); } doc_spam_prob= (numerator/(denom1+denom2)); return 0; } I tried Float, double and even long double datatypes but still same problem. Hence, say in a 100K words document I am analyzing, if just 162 words are having 1% spam probability and remaining 99838 are conspicuously spam words, then still my app will say it as Not Spam doc because of Precision error (as numerator easily goes to ZERO)!!!. This is the first time I am hitting such issue. So how exactly should this problem be tackled?

    Read the article

  • Preoblem with Precision floating point operation in C

    - by Microkernel
    Hi Guys, For one of my course project I started implementing "Naive Bayesian classifier" in C. My project is to implement a document classifier application (especially Spam) using huge training data. Now I have problem implementing the algorithm because of the limitations in the C's datatype. ( Algorithm I am using is given here, http://en.wikipedia.org/wiki/Bayesian_spam_filtering ) PROBLEM STATEMENT: The algorithm involves taking each word in a document and calculating probability of it being spam word. If p1, p2 p3 .... pn are probabilities of word-1, 2, 3 ... n. The probability of doc being spam or not is calculated using Here, probability value can be very easily around 0.01. So even if I use datatype "double" my calculation will go for a toss. To confirm this I wrote a sample code given below. #define PROBABILITY_OF_UNLIKELY_SPAM_WORD (0.01) #define PROBABILITY_OF_MOSTLY_SPAM_WORD (0.99) int main() { int index; long double numerator = 1.0; long double denom1 = 1.0, denom2 = 1.0; long double doc_spam_prob; /* Simulating FEW unlikely spam words */ for(index = 0; index < 162; index++) { numerator = numerator*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD; denom2 = denom2*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD; denom1 = denom1*(long double)(1 - PROBABILITY_OF_UNLIKELY_SPAM_WORD); } /* Simulating lot of mostly definite spam words */ for (index = 0; index < 1000; index++) { numerator = numerator*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD; denom2 = denom2*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD; denom1 = denom1*(long double)(1- PROBABILITY_OF_MOSTLY_SPAM_WORD); } doc_spam_prob= (numerator/(denom1+denom2)); return 0; } I tried Float, double and even long double datatypes but still same problem. Hence, say in a 100K words document I am analyzing, if just 162 words are having 1% spam probability and remaining 99838 are conspicuously spam words, then still my app will say it as Not Spam doc because of Precision error (as numerator easily goes to ZERO)!!!. This is the first time I am hitting such issue. So how exactly should this problem be tackled?

    Read the article

  • How to get a random node from a tree?

    - by ooboo
    It looks easy, but I found the implementation tricky. I need that for a simple genetic programming problem I'm trying to implement. The function should, given a node, return the node itself or any of its children such that the probability of choosing a node is normally distributed relative to its depth (so the function should return mostly middle nodes, but sometimes the root itself or the lowest ones - but that's not really necessary if that makes it significantly more complex, if all any node is chosen with equal probability, that's good enough). Thanks

    Read the article

  • Naive Bayesian classification (spam filtering) - Doubt in one calculation? Which one is right? Plz c

    - by Microkernel
    Hi guys, I am implementing Naive Bayesian classifier for spam filtering. I have doubt on some calculation. Please clarify me what to do. Here is my question. In this method, you have to calculate P(S|W) - Probability that Message is spam given word W occurs in it. P(W|S) - Probability that word W occurs in a spam message. P(W|H) - Probability that word W occurs in a Ham message. So to calculate P(W|S), should I do (1) (Number of times W occuring in spam)/(total number of times W occurs in all the messages) OR (2) (Number of times word W occurs in Spam)/(Total number of words in the spam message) So, to calculate P(W|S), should I do (1) or (2)? (I thought it to be (2), but I am not sure, so plz clarify me) I am refering http://en.wikipedia.org/wiki/Bayesian_spam_filtering for the info by the way. I got to complete the implementation by this weekend :( Thanks and regards, MicroKernel :) @sth: Hmm... Shouldn't repeated occurrence of word 'W' increase a message's spam score? In the your approach it wouldn't, right?. Lets take a scenario and discuss... Lets say, we have 100 training messages, out of which 50 are spam and 50 are Ham. and say word_count of each message = 100. And lets say, in spam messages word W occurs 5 times in each message and word W occurs 1 time in Ham message. So total number of times W occuring in all the spam message = 5*50 = 250 times. And total number of times W occuring in all Ham messages = 1*50 = 50 times. Total occurance of W in all of the training messages = (250+50) = 300 times. So, in this scenario, how do u calculate P(W|S) and P(W|H) ? Naturally we should expect, P(W|S) P(W|H)??? right. Please share your thought...

    Read the article

  • Making more recent items more likely to be drawn

    - by bobo
    There are a few hundred of book records in the database and each record has a publish time. In the homepage of the website, I am required to write some codes to randomly pick 10 books and put them there. The requirement is that newer books need to have higher chances of getting displayed. Since the time is an integer, I am thinking like this to calculate the probability for each book: Probability of a book to be drawn = (current time - publish time of the book) / ((current time - publish time of the book1) + (current time - publish time of the book1) + ... (current time - publish time of the bookn)) After a book is drawn, the next round of the loop will minus the (current time - publish time of the book) from the denominator and recalculate the probability for each of the remaining books, the loop continues until 10 books have been drawn. Is this algorithm a correct one? By the way, the website is written in PHP. Feel free to suggest some PHP codes if you have a better algorithm in your mind. Many thanks to you all.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >