Search Results

Search found 1745 results on 70 pages for 'probability theory'.

Page 17/70 | < Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24  | Next Page >

  • Specification, modeling and programming are principially the same, right?

    - by Gabriel Šcerbák
    In formal specifications based on abstract algebraic types and equational theory you use formulas of equational theory to specify theory. System which will satisfy those constraints is called in formal logic a model. Modeling is process of creating a model, which abstracts of some aspects, which are unnecessary details for a specific case. So concrete system has to adhere to created model in observed aspects. Programming is a process of creating a program which will have specific behaviour - will perform specific algorithms - and programming languages through different paradigms enable us to think in a certain specific way, which abstracts of some details, usually machine specific ones. So could we be doing all those things at the same time, because they are principially the same? Is declarative programming the nearest attempt to do that? Could we use some sort f programming languages which will be good for programming as well as for modeling and specification?

    Read the article

  • Naive Bayesian classification (spam filtering) - Doubt in one calculation? Which one is right? Plz c

    - by Microkernel
    Hi guys, I am implementing Naive Bayesian classifier for spam filtering. I have doubt on some calculation. Please clarify me what to do. Here is my question. In this method, you have to calculate P(S|W) - Probability that Message is spam given word W occurs in it. P(W|S) - Probability that word W occurs in a spam message. P(W|H) - Probability that word W occurs in a Ham message. So to calculate P(W|S), should I do (1) (Number of times W occuring in spam)/(total number of times W occurs in all the messages) OR (2) (Number of times word W occurs in Spam)/(Total number of words in the spam message) So, to calculate P(W|S), should I do (1) or (2)? (I thought it to be (2), but I am not sure, so plz clarify me) I am refering http://en.wikipedia.org/wiki/Bayesian_spam_filtering for the info by the way. I got to complete the implementation by this weekend :( Thanks and regards, MicroKernel :) @sth: Hmm... Shouldn't repeated occurrence of word 'W' increase a message's spam score? In the your approach it wouldn't, right?. Lets take a scenario and discuss... Lets say, we have 100 training messages, out of which 50 are spam and 50 are Ham. and say word_count of each message = 100. And lets say, in spam messages word W occurs 5 times in each message and word W occurs 1 time in Ham message. So total number of times W occuring in all the spam message = 5*50 = 250 times. And total number of times W occuring in all Ham messages = 1*50 = 50 times. Total occurance of W in all of the training messages = (250+50) = 300 times. So, in this scenario, how do u calculate P(W|S) and P(W|H) ? Naturally we should expect, P(W|S) P(W|H)??? right. Please share your thought...

    Read the article

  • Making more recent items more likely to be drawn

    - by bobo
    There are a few hundred of book records in the database and each record has a publish time. In the homepage of the website, I am required to write some codes to randomly pick 10 books and put them there. The requirement is that newer books need to have higher chances of getting displayed. Since the time is an integer, I am thinking like this to calculate the probability for each book: Probability of a book to be drawn = (current time - publish time of the book) / ((current time - publish time of the book1) + (current time - publish time of the book1) + ... (current time - publish time of the bookn)) After a book is drawn, the next round of the loop will minus the (current time - publish time of the book) from the denominator and recalculate the probability for each of the remaining books, the loop continues until 10 books have been drawn. Is this algorithm a correct one? By the way, the website is written in PHP. Feel free to suggest some PHP codes if you have a better algorithm in your mind. Many thanks to you all.

    Read the article

  • NTFS Issues in Windows 7 and 2008 R2 - 'Is it a Bug?'

    - by renewieldraaijer
    I have been using the various versions of the Microsoft Windows product line since NT4 and I really thought I knew the ins and outs about the NTFS filesystem by now. There were always a few rules of thumb to understand what happens if you move data around. These rules were: "If you copy data, the copied data will inherit the permissions of the location it is being copied to. The same goes for moving data between disk partitions. Only when you move data within the same partition, the permissions are kept."  Recently I was asked to assist in troubleshooting some NTFS related issues. This forced me to have another good look at this theory. To my surprise I found out that this theory does not completely stand anymore. Apparently some things have changed since the release of Windows Vista / Windows 2008. Since the release of these Operating Systems, a move within the same disk partition results in the data inheriting the permissions of the location it is being copied into. A major change in the NTFS filesystem you would think!  Not quite! The above only counts when the move operation is being performed by using Windows Explorer. A move by using the 'move' command from within a cmd prompt for example, retains the NTFS permissions, just like before in Windows XP and older systems. Conclusion: The Windows Explorer is responsible for changing the ACL's of the moved data. This is a remarkable change, but if you follow this theory, the resulting ACL after a move operation is still predictable.  We could say that since Windows Vista and Windows 2008, a new rule set applies: "If you copy data, the copied data will inherit the permissions of the location it is being copied to. Same goes for moving data between disk partitions and within disk partitions. Only when you move data within the same partition by using something else than the Windows Explorer, the permissions are kept." The above behavior should be unchanged in Windows 7 / Windows 2008 R2, compared to Windows Vista / 2008. But somehow the NTFS permissions are not so predictable in Windows 7 and Windows 2008 R2. Moving data within the same disk partition the one time results in the permissions being kept and the next time results in inherited permissions from the destination location. I will try to demonstrate this in a few examples: Example 1 (Incorrect behavior): Consider two folders, 'Folder A' and 'Folder B' with the following permissions configured.                    Now we create the test file 'test file 1.txt' in 'Folder A' and afterwards move this file to 'Folder B' using Windows Explorer.                       According to the new theory, the file should inherit the permissions of 'Folder B' and therefore 'Group B' should appear in the ACL of 'test file 1.txt'. In the screenshot below the resulting permissions are displayed. The permissions from the originating location are kept, while the permissions of 'Folder B' should be inherited.                   Example 2 (Correct behavior): Again, consider the same two folders. This time we make a small modification to the ACL of 'Folder A'. We add 'Group C' to the ACL and again we create a file in 'Folder A' which we name 'test file 2.txt'.                    Next, we move 'test file 2.txt' to 'Folder B'.                       Again, we check the permissions of 'test file 2.txt' at the target location. We can now see that the permissions are inherited. This is what should be happening, and can be considered 'correct behavior' for Windows Vista / 2008 / 7 / 2008 R2. It remains uncertain why this behavior is so inconsistent. At this time, this is under investigation with Microsoft Support. The investigation has been going for the last two weeks and it is beginning to look like there is no rational reason for this, other than a bug in the Windows Explorer in Windows 7 and 2008 R2. As soon as there is any certainty on this, I will note it here in this blog.                   The examples above are harmless tests, by using my own laptop. If you would create the same set of folders and groups, and configure exactly the same permissions, you will see exactly the same behavior. Be sure to use Windows 7 or Windows 2008 R2.   Initially the problem arose at a customer site where move operations on data on the fileserver by users would result in unpredictable results. This resulted in the wrong set of people having àccess permissions on data that they should not have permissions to. Off course this is something we want to prevent at all costs.   I have also done several tests with move operations by using the move command in a cmd prompt. This way the behavior is always consistent. The inconsistent behavior is only exposed when using the Windows Explorer to initiate the move operation, and only when using Windows 7 or Windows 2008 R2 systems. It is evident that this behavior changes when the ACL of a folder has been changed, for example by adding an extra entry. The reason for this remains uncertain though. To be continued…. A dutch version of this post can be found at: http://blogs.platani.nl/?p=612

    Read the article

  • Transitioning from Oracle based CMS to MySQL based CMS

    - by KM01
    We're looking at a replacement for our CMS which runs on Oracle. The new CMSes that we've looked at can in theory run on Oracle, but most of the vendor's installs run off of MySQL vendor supports install of their CMS on MySQL, and a "theoretical" install on Oracle the vendor's dev shops use MySQL none of them develop/test against Oracle Our DBA team works exclusively with Oracle, and doesn't have the bandwidth to provide additional support for a highly available and performing MySQL setup. They could in theory go to training and get ramped up, but our time line is also short (surprise!). So ... I guess my question(s) are: If you've seen a situation like this, how have you dealt with it? What tipped the balance either way? What type of effort did it take? If you're to do it over, what would you do differently ... ? Thanks! KM

    Read the article

  • Data Modeling Resources

    - by Dejan Sarka
    You can find many different data modeling resources. It is impossible to list all of them. I selected only the most valuable ones for me, and, of course, the ones I contributed to. Books Chris J. Date: An Introduction to Database Systems – IMO a “must” to understand the relational model correctly. Terry Halpin, Tony Morgan: Information Modeling and Relational Databases – meet the object-role modeling leaders. Chris J. Date, Nikos Lorentzos and Hugh Darwen: Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL – all theory needed to manage temporal data. Louis Davidson, Jessica M. Moss: Pro SQL Server 2012 Relational Database Design and Implementation – the best SQL Server focused data modeling book I know by two of my friends. Dejan Sarka, et al.: MCITP Self-Paced Training Kit (Exam 70-441): Designing Database Solutions by Using Microsoft® SQL Server™ 2005 – SQL Server 2005 data modeling training kit. Most of the text is still valid for SQL Server 2008, 2008 R2, 2012 and 2014. Itzik Ben-Gan, Lubor Kollar, Dejan Sarka, Steve Kass: Inside Microsoft SQL Server 2008 T-SQL Querying – Steve wrote a chapter with mathematical background, and I added a chapter with theoretical introduction to the relational model. Itzik Ben-Gan, Dejan Sarka, Roger Wolter, Greg Low, Ed Katibah, Isaac Kunen: Inside Microsoft SQL Server 2008 T-SQL Programming – I added three chapters with theoretical introduction and practical solutions for the user-defined data types, dynamic schema and temporal data. Dejan Sarka, Matija Lah, Grega Jerkic: Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft SQL Server 2012 – my first two chapters are about data warehouse design and implementation. Courses Data Modeling Essentials – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses [email protected] or [email protected]. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well. Logical and Physical Modeling for Analytical Applications – online course I wrote for Pluralsight. Working with Temporal data in SQL Server – my latest Pluralsight course, where besides theory and implementation I introduce many original ways how to optimize temporal queries. Forthcoming presentations SQL Bits 12, July 17th – 19th, Telford, UK – I have a full-day pre-conference seminar Advanced Data Modeling Topics there.

    Read the article

  • Relationship between TDD and Software Architecture/Design

    - by Christopher Francisco
    I'm new to TDD and have been reading the theory since applying it is more complicated than it sounds when you're learning by yourself. As far as I know, the objective is to write test cases for each requirement and run the test so it fails (to prevent a false positive). Afterward, you should write the minimum amount of code that can pass the test and move to the next one. That being said, is it true that you get a fast development, but what about the code itself? this theory makes me think you are not considering things like abstraction, delegation of responsibilities, design patterns, architecture and others since you're just writing "the minimum amount of code that can pass the test". I know I'm probably wrong because if this were true, we'd have a lot of crappy developers with poor software architecture and documentation so I'm asking for a guide here, what's the relationship between TDD and Software Architecture/Design?

    Read the article

  • Will high reputation in Programmers help to get a good job?

    - by Lorenzo
    In reference to this question, do you think that having a high reputation on this site will help to get a good job? Aside silly and humorous questions, on Programmers we can see a lot of high quality theory questions. I think that, if Stack Overflow will eventually evolve in "strictly programming related" (which usually is "strictly coding related"), the questions on Programmers will be much more interesting and meaningful ("Stack Overflow" = "I have this specific coding/implementation issue"; "Programmers" = "Best practices, team shaping, paradigms, CS theory"). So could high reputation on this site help (or at least be a good reference)? And then, more o less than Stack Overflow?

    Read the article

  • The Joy Of Hex

    - by Jim Giercyk
    While working on a mainframe integration project, it occurred to me that some basic computer concepts are slipping into obscurity. For example, just about anyone can tell you that a 64-bit processor is faster than a 32-bit processer. A grade school child could tell you that a computer “speaks” in ‘1’s and ‘0’s. Some people can even tell you that there are 8 bits in a byte. However, I have found that even the most seasoned developers often can’t explain the theory behind those statements. That is not a knock on programmers; in the age of IntelliSense, what reason do we have to work with data at the bit level? Many computer theory classes treat bit-level programming as a thing of the past, no longer necessary now that storage space is plentiful. The trouble with that mindset is that the world is full of legacy systems that run programs written in the 1970’s.  Today our jobs require us to extract data from those systems, regardless of the format, and that often involves low-level programming. Because it seems knowledge of the low-level concepts is waning in recent times, I thought a review would be in order.       CHARACTER: See Spot Run HEX: 53 65 65 20 53 70 6F 74 20 52 75 6E DECIMAL: 83 101 101 32 83 112 111 116 32 82 117 110 BINARY: 01010011 01100101 01100101 00100000 01010011 01110000 01101111 01110100 00100000 01010010 01110101 01101110 In this example, I have broken down the words “See Spot Run” to a level computers can understand – machine language.     CHARACTER:  The character level is what is rendered by the computer.  A “Character Set” or “Code Page” contains 256 characters, both printable and unprintable.  Each character represents 1 BYTE of data.  For example, the character string “See Spot Run” is 12 Bytes long, exclusive of the quotation marks.  Remember, a SPACE is an unprintable character, but it still requires a byte.  In the example I have used the default Windows character set, ASCII, which you can see here:  http://www.asciitable.com/ HEX:  Hex is short for hexadecimal, or Base 16.  Humans are comfortable thinking in base ten, perhaps because they have 10 fingers and 10 toes; fingers and toes are called digits, so it’s not much of a stretch.  Computers think in Base 16, with numeric values ranging from zero to fifteen, or 0 – F.  Each decimal place has a possible 16 values as opposed to a possible 10 values in base 10.  Therefore, the number 10 in Hex is equal to the number 16 in Decimal.  DECIMAL:  The Decimal conversion is strictly for us humans to use for calculations and conversions.  It is much easier for us humans to calculate that [30 – 10 = 20] in decimal than it is for us to calculate [1E – A = 14] in Hex.  In the old days, an error in a program could be found by determining the displacement from the entry point of a module.  Since those values were dumped from the computers head, they were in hex. A programmer needed to convert them to decimal, do the equation and convert back to hex.  This gets into relative and absolute addressing, a topic for another day.  BINARY:  Binary, or machine code, is where any value can be expressed in 1s and 0s.  It is really Base 2, because each decimal place can have a possibility of only 2 characters, a 1 or a 0.  In Binary, the number 10 is equal to the number 2 in decimal. Why only 1s and 0s?  Very simply, computers are made up of lots and lots of transistors which at any given moment can be ON ( 1 ) or OFF ( 0 ).  Each transistor is a bit, and the order that the transistors fire (or not fire) is what distinguishes one value from  another in the computers head (or CPU).  Consider 32 bit vs 64 bit processing…..a 64 bit processor has the capability to read 64 transistors at a time.  A 32 bit processor can only read half as many at a time, so in theory the 64 bit processor should be much faster.  There are many more factors involved in CPU performance, but that is the fundamental difference.    DECIMAL HEX BINARY 0 0 0000 1 1 0001 2 2 0010 3 3 0011 4 4 0100 5 5 0101 6 6 0110 7 7 0111 8 8 1000 9 9 1001 10 A 1010 11 B 1011 12 C 1100 13 D 1101 14 E 1110 15 F 1111   Remember that each character is a BYTE, there are 2 HEX characters in a byte (called nibbles) and 8 BITS in a byte.  I hope you enjoyed reading about the theory of data processing.  This is just a high-level explanation, and there is much more to be learned.  It is safe to say that, no matter how advanced our programming languages and visual studios become, they are nothing more than a way to interpret bits and bytes.  There is nothing like the joy of hex to get the mind racing.

    Read the article

  • CS subjects that an undergraduate must know.

    - by Karl
    In college, I was never interested in theory. I never read it. No matter how much I tried, I was unable to read stuff and not know what was actually happening practically. Like for example, in my course on automata theory, my professor told me everything possibly related to the mathematical aspect of it, but not even once did he mention where it would be used practically. This is just an example. I managed to pass my college and interned with a company also, where I did a project and thankfully they didn't bother about my grades, as they were above average. Now, I am interested in knowing what subjects should a CS student must absolutely and positively be aware of? Subjects that can have relevance in the industry. This is because I have some free time on my hands and it would help me better to have a good understanding of them. What are your suggestions? Like for one, algorithms is one subject.

    Read the article

  • Getting to math applications gradually

    - by den-javamaniac
    I'm currently getting a formal degree related to computation, in particular my current focus is numerical programming, scientific computing and machine learning. I'd love to apply that knowledge in game dev and expand it with statistics, probability theory, and graph theory (probably even linear algebra). The question is: which spheres of gamedev are filled with such math stuff, is it possible to advance in those without being a part of a group of people and how to get to it gradually? P.S.: I've got experience with commercial java dev and am getting my hands on C/C++ at the moment, however, I'm opened to go ahead and try Unity3D and etc.

    Read the article

  • Crappy school, what to do? [closed]

    - by zhenka
    I started programming fairly late. I am 24 years old and about to graduate from a local public university with a really poorly designed curriculum and teachers. Most of the work felt like busy work, and no matter how much I try, it all feels like a waste. I know what a good curriculum looks like. I know what books I should read, but alas it's not so in my university. There is no way at this point that I can catch up to those graduating from places like MIT. My question and this is a serious one: what do I do? Do I just postpone learning the theory I would have learned until later and focus on software engineering skills? How important is the theory in terms of landing a job in New York? Any particular things I should focus on to land a software engineer job? I am very motivated and I just wish someone would give me the time and a chance.

    Read the article

  • How can I repair a corrupt kernel if no others are installed?

    - by Willi Ballenthin
    I've been running Ubuntu 10.04 on my laptop for quite some time. When I went to boot it up this morning, BAM! kernel panic (which immediately lead to human panic) when loading the kernel. So I've spent much of the day troubleshooting, and my current theory is that the FS is fine, but that the kernel image may be corrupt. Let's go with this current theory for the sake of this question, as I am interested how it is done. How can I replace the kernel image if I have no bootable kernels? Can I boot to a 10.04 live CD, copy the the vmlinuz-2.6.3x... to the HD and go from there? Wouldn't I want to copy the initramfs as well, but configured for the desktop system? Can I generate this from the live CD?

    Read the article

  • Algorithms for positioning rectangles evenly spaced with unknown connecting lines

    - by MacGyver
    I'm new to game development, but I'm trying to figure out a good algorithm for positioning rectangles (of any width and height) in a given surface area, and connecting them with any variation of lines. Two rectangles will never have more than one line connecting them. Where would I begin working on a problem like this? This is only a 2 dimensional surface. I read about graph theory, and it seems like this is a close representation of that. The rectangles would be considered a node, and the lines connecting them would be considered an edge in graph theory.

    Read the article

  • How to devise instruction set of a stack based machine?

    - by Anindya Chatterjee
    Stack based virtual machines like CLR and JVM has different set of instructions. Is there any theory behind devising the instruction set while creating a virtual machine? e.g. there are JVM instruction sets to load constants from 0-5 onto the stack iconst_0 iconst_1 iconst_2 iconst_3 iconst_4 iconst_5 whereas in CLR there are instruction set to load number from 0 to 8 onto the stack as follows ldc.i4.0 ldc.i4.1 ldc.i4.2 ldc.i4.3 ldc.i4.4 ldc.i4.5 ldc.i4.6 ldc.i4.7 ldc.i4.8 why there is no ldc.i4.9 and if ldc.i4 is there why we need the above opcodes? And there are others like these. I am eager to know what is the reason behind this difference between opcodes of different VMs? Is there any specific theory to devise these opcodes or it is totally driven by the characteristics of the VM itself or depends on the high-level language constructs?

    Read the article

  • Dealing with engineers that frequently leave their jobs [closed]

    - by ??? Shengyuan Lu
    My friend is a project manager for a software company. The most frustrating thing for him is that his engineers frequently leave their jobs. The company works hard to recruit new engineers, transfer projects, and keep a stable quality product. When people leave, it drives my friend crazy. These engineers are quite young and ambitious, and they want higher salaries and better positions. The big boss only thinks about it in financial terms, and his theory is that “three newbies are always better than one veteran” (which, as an experienced engineer, I know is wrong). My friend hates that theory. Any advice for him?

    Read the article

  • Dealing with engineers that frequently leave their jobs

    - by ??? Shengyuan Lu
    My friend is a project manager for a software company. The most frustrating thing for him is that his engineers frequently leave their jobs. The company works hard to recruit new engineers, transfer projects, and keep a stable quality product. When people leave, it drives my friend crazy. These engineers are quite young and ambitious, and they want higher salaries and better positions. The big boss only thinks about it in financial terms, and his theory is that “three newbies are always better than one veteran” (which, as an experienced engineer, I know is wrong). My friend hates that theory. Any advice for him?

    Read the article

  • Great Debugging skills weak problem solving

    - by Mahmoud
    For the 5 years I worked for various companies, I worked in large software like computer vision kits, embedded, games. I found myself very good at debuggins skills, I've even found and fixed bugs in frameworks and I solved them. The problem is that I'm very weak at problem solving. I got interview with Qualcomm, and they said you're fine at software, but you have a limited problem solving, I also had the same results with Google. I'm very bad at solving puzzles and brain teasers. During the interviews I solve all of the software related problems on the blackboard, but when I went to the GM and face math problems and probabilities, I struggle. How can I improve my problem solving skills? Edit Some of the problems: A cake that is cut from anywhere and needs just one cut to halved in equal. I told him cut it horizontally, he said No, consider it as a 2D Problem!. Consider a concenteric 3 circles, each one can get a color, but not matched with the other circle, how many blobs you can make out of those circles ? this was with the GM ( Augmented Reality SDK) Consider a train, an infinite one, and you looked at the window, and there are two cars, one big, and one small, what is the probability of having only a big car, I said 50%, he said, what if that two cars you dont know their length, and you want to get the probability of getting the biggest one, I struggled, didn't solve it... I was really exahusted after long day of interviews prob of having a number divisible by 5 in numbers from 1 to 100.. struggled!! All coding questions I solved them like reverse a string, detect a cycle in a linked list,..etc.

    Read the article

  • Does it matter the direction of a Huffman's tree child node?

    - by Omega
    So, I'm on my quest about creating a Java implementation of Huffman's algorithm for compressing/decompressing files (as you might know, ever since Why create a Huffman tree per character instead of a Node?) for a school assignment. I now have a better understanding of how is this thing supposed to work. Wikipedia has a great-looking algorithm here that seemed to make my life way easier. Taken from http://en.wikipedia.org/wiki/Huffman_coding: Create a leaf node for each symbol and add it to the priority queue. While there is more than one node in the queue: Remove the two nodes of highest priority (lowest probability) from the queue Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities. Add the new node to the queue. The remaining node is the root node and the tree is complete. It looks simple and great. However, it left me wondering: when I "merge" two nodes (make them children of a new internal node), does it even matter what direction (left or right) will each node be afterwards? I still don't fully understand Huffman coding, and I'm not very sure if there is a criteria used to tell whether a node should go to the right or to the left. I assumed that, perhaps the highest-frequency node would go to the right, but I've seen some Huffman trees in the web that don't seem to follow such criteria. For instance, Wikipedia's example image http://upload.wikimedia.org/wikipedia/commons/thumb/8/82/Huffman_tree_2.svg/625px-Huffman_tree_2.svg.png seems to put the highest ones to the right. But other images like this one http://thalia.spec.gmu.edu/~pparis/classes/notes_101/img25.gif has them all to the left. However, they're never mixed up in the same image (some to the right and others to the left). So, does it matter? Why?

    Read the article

  • How do I produce "enjoyably" random, as opposed to pseudo-random?

    - by Hilton Campbell
    I'm making a game which presents a number of different kinds of puzzles in sequence. I choose each puzzle with a pseudorandom number. For each puzzle, there are a number of variations. I choose the variation with another pseudorandom number. And so on. The thing is, while this produces near-true randomness, this isn't what the player really wants. The player typically wants what they perceive to be and identify as random, but only if it doesn't tend to repeat puzzles. So, not really random. Just unpredictable. Giving it some thought, I can imagine hacky ways of doing it. For example, temporarily eliminating the most recent N choices from the set of possibilities when selecting a new choice. Or assigning every choice an equal probability, reducing a choice's probability to zero on selection, and then increasing all probabilities slowly with each selection. I assume there's an established way of doing this, but I just don't know the terminology so I can't find it. Anyone know? Or has anyone solved this in a pleasing way?

    Read the article

  • Searching through large data set

    - by calccrypto
    how would i search through a list with ~5 mil 128bit (or 256, depending on how you look at it) strings quickly and find the duplicates (in python)? i can turn the strings into numbers, but i don't think that's going to help much. since i haven't learned much information theory, is there anything about this in information theory? and since these are hashes already, there's no point in hashing them again

    Read the article

  • Do Not Optimize Without Measuring

    - by Alois Kraus
    Recently I had to do some performance work which included reading a lot of code. It is fascinating with what ideas people come up to solve a problem. Especially when there is no problem. When you look at other peoples code you will not be able to tell if it is well performing or not by reading it. You need to execute it with some sort of tracing or even better under a profiler. The first rule of the performance club is not to think and then to optimize but to measure, think and then optimize. The second rule is to do this do this in a loop to prevent slipping in bad things for too long into your code base. If you skip for some reason the measure step and optimize directly it is like changing the wave function in quantum mechanics. This has no observable effect in our world since it does represent only a probability distribution of all possible values. In quantum mechanics you need to let the wave function collapse to a single value. A collapsed wave function has therefore not many but one distinct value. This is what we physicists call a measurement. If you optimize your application without measuring it you are just changing the probability distribution of your potential performance values. Which performance your application actually has is still unknown. You only know that it will be within a specific range with a certain probability. As usual there are unlikely values within your distribution like a startup time of 20 minutes which should only happen once in 100 000 years. 100 000 years are a very short time when the first customer tries your heavily distributed networking application to run over a slow WIFI network… What is the point of this? Every programmer/architect has a mental performance model in his head. A model has always a set of explicit preconditions and a lot more implicit assumptions baked into it. When the model is good it will help you to think of good designs but it can also be the source of problems. In real world systems not all assumptions of your performance model (implicit or explicit) hold true any longer. The only way to connect your performance model and the real world is to measure it. In the WIFI example the model did assume a low latency high bandwidth LAN connection. If this assumption becomes wrong the system did have a drastic change in startup time. Lets look at a example. Lets assume we want to cache some expensive UI resource like fonts objects. For this undertaking we do create a Cache class with the UI themes we want to support. Since Fonts are expensive objects we do create it on demand the first time the theme is requested. A simple example of a Theme cache might look like this: using System; using System.Collections.Generic; using System.Drawing; struct Theme { public Color Color; public Font Font; } static class ThemeCache { static Dictionary<string, Theme> _Cache = new Dictionary<string, Theme> { {"Default", new Theme { Color = Color.AliceBlue }}, {"Theme12", new Theme { Color = Color.Aqua }}, }; public static Theme Get(string theme) { Theme cached = _Cache[theme]; if (cached.Font == null) { Console.WriteLine("Creating new font"); cached.Font = new Font("Arial", 8); } return cached; } } class Program { static void Main(string[] args) { Theme item = ThemeCache.Get("Theme12"); item = ThemeCache.Get("Theme12"); } } This cache does create font objects only once since on first retrieve of the Theme object the font is added to the Theme object. When we let the application run it should print “Creating new font” only once. Right? Wrong! The vigilant readers have spotted the issue already. The creator of this cache class wanted to get maximum performance. So he decided that the Theme object should be a value type (struct) to not put too much pressure on the garbage collector. The code Theme cached = _Cache[theme]; if (cached.Font == null) { Console.WriteLine("Creating new font"); cached.Font = new Font("Arial", 8); } does work with a copy of the value stored in the dictionary. This means we do mutate a copy of the Theme object and return it to our caller. But the original Theme object in the dictionary will have always null for the Font field! The solution is to change the declaration of struct Theme to class Theme or to update the theme object in the dictionary. Our cache as it is currently is actually a non caching cache. The funny thing was that I found out with a profiler by looking at which objects where finalized. I found way too many font objects to be finalized. After a bit debugging I found the allocation source for Font objects was this cache. Since this cache was there for years it means that the cache was never needed since I found no perf issue due to the creation of font objects. the cache was never profiled if it did bring any performance gain. to make the cache beneficial it needs to be accessed much more often. That was the story of the non caching cache. Next time I will write something something about measuring.

    Read the article

  • Binomial test in Python

    - by Morlock
    I need to do a binomial test in Python that allows calculation for 'n' numbers of the order of 10000. I have implemented a quick binomial_test function using scipy.misc.comb, however, it is pretty much limited around n = 1000, I guess because it reaches the biggest representable number while computing factorials or the combinatorial itself. Here is my function: from scipy.misc import comb def binomial_test(n, k): """Calculate binomial probability """ p = comb(n, k) * 0.5**k * 0.5**(n-k) return p How could I use a native python (or numpy, scipy...) function in order to calculate that binomial probability? If possible, I need scipy 0.7.2 compatible code. Many thanks!

    Read the article

< Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24  | Next Page >