probability theory - Page 27

Can I temporarily leave other roles active when converting a 2008 R2 server to a hyper-v host?

- by Eric

I currently have a Windows 2008 R2 box that I want to convert to a hyper-v host. Currently I have the following running on it: File Services, Web Server (IIS), and Sql Server 2008 R2 with a few different DBs. In the long term, I'd like to move all of those over to the VMs that will be created. However, in the short term I'd like to leave them running on the server. Are there any complications/problems with leaving them running on the hyper-v server for a day or two? How long will those services typically go down during the addition of the hyper-v role? Right now my plan is to back up everything, enable the hyper-v role, set up VMs, and migrate stuff off of the host. However, if there is a good probability that something may cease to function I might need to migrate everything over to a temporary host before doing the conversion.

Read the article

Does sending e-mail in the name of customers increase the risk of being marked as spammer?

- by Adrian Grigore

Hi, We are developing a SaaS website application that lets users send invoices to their clients. Ideally, these e-mails should appear to be originating from our customers, so the sender e-mail address domain will not match the reverse IP entry for our server. In effect we would be forging their e-mail address, but of course with their consent. Will that result in a higher probability of being marked as a spammer / their e-mails being marked as spam? If yes, how bad is the penalty? And what about people who have an e-mail address originating form an SPF-enabled domain? I guess it should be the majority of the big e-mail providers.

Read the article

How to backup a remote VPS machine?

- by morpheous

I am considering opting for a VPS solution, with the server running Ubuntu server. I am pretty new to this, and I need to come up with a backup policy for my server data. Initial data is likely to be about 80Mb, and I expect the data to grow at approximately 5Mb to 10 Mb a day. Can anyone recommend: A backup/restore policy (best practises for a small startup) Which tools to use for backup? Another thing that is not clear to me is - where are the files backed up to normally (in the case of remote servers). If the files are backed up to the same machine (or even to another machine but with the same host), there is potentially, a single point of failure). How do people normally backup their server data, and is the probability of machine meltdown or the host company server farm "catching fire" so remote as not to be worth worrying about - especially for a small (read one man) startup like me?

Read the article

What ports tend to be unfiltered by boneheaded firewalls?

- by Reid

Hi all, I like to be able to ssh into my server (shocking, I know). The problem comes when I'm traveling, where I face a variety of firewalls in hotels and other institutions, having a variety of configurations, sometimes quite boneheaded. I'd like to set up an sshd listening on a port that has a high probability of getting through this mess. Any suggestions? The sshd currently listens on a nonstandard (but < 1024) port to avoid script kiddies knocking on the door. This port is frequently blocked, as is the other nonstandard port where my IMAP server lives. I have services running on ports 25 and 80 but anything else is fair game. I was thinking 443 perhaps. Much appreciated! Reid

Read the article

Segmentation Fault with mod_include

- by Benedikt Eger

Hi, I'm using a rather complex structure with multiple ssi-includes, set- and echo-commands. The first document writes a lot of set-commands, includes another document which then again includes a third document. On the last included document the variable values are printed using the echo-command. I noticed that with an increasing number of variables the probability for a segmentation fault to happen rises. Did anyone experience something similar? How do I go about debugging such a problem? I'm using IBM_HTTP_Server/2.0.47.1-PK65782 Apache/2.0.47

Read the article

Run Wave Trusted Drive Manager from a bootable CD, recover crashed enrypted SSD?

- by TigerInCanada

Is there a way to run Wave Trusted Drive Manager from a live-cd to access a non-bootable SSD with Full Disk Encyption hard disk? http://www.wave.com/products/tdm.asp The crashed disk is a Samsung SSD PB22-JS3, 128Gb. Is has bad blocks at 128-block intervals. If the SSD password could be unset, is sending the unit for disaster recovery possible? What might cause a nearly new SSD to crash in this way, and what is the probability of it happening again? We have other units in service an I can do without every laptop disk in the company crashing...

Read the article

Utilize two gateways on the same network same interface with load balancing

- by RushPL

My setup is two ISPs on a single interface and single network. I can either set my default gateway to 192.168.0.1 or 192.168.1.250 and either work. My desire is to utilize both of them with some load balancing. I have tried to follow the advice given in here http://serverfault.com/a/96586 #!/bin/sh ip route show table main | grep -Ev '^default' \ | while read ROUTE ; do ip route add table ISP1 $ROUTE done ip route add default via 192.168.1.250 table ISP1 ip route add default via 192.168.0.1 table ISP2 iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark iptables -t mangle -A PREROUTING -m mark ! --mark 0 -j ACCEPT iptables -t mangle -A PREROUTING -j MARK --set-mark 10 iptables -t mangle -A PREROUTING -m statistic --mode random --probability 0.5 -j MARK --set-mark 20 iptables -t mangle -A PREROUTING -j CONNMARK --save-mark Now then I do "traceroute somehost" repeatedly I can only get route through my default route which is 192.168.1.250. Shouldn't the packets change routes in a random manner? How to debug it?

Read the article

How to analyse logs after the site was hacked

- by Vasiliy Toporov

One of our web-projects was hacked. Malefactor changed some template files in project and 1 core file of the web-framework (it's one of the famous php-frameworks). We found all corrupted files by git and reverted them. So now I need to find the weak point. With high probability we can say, that it's not the ftp or ssh password abduction. The support specialist of hosting provider (after logs analysis) said that it was the security hole in our code. My questions: 1) What tools should I use, to review access and error logs of Apache? (Our server distro is Debian). 2) Can you write tips of suspicious lines detection in logs? Maybe tutorials or primers of some useful regexps or techniques? 3) How to separate "normal user behavior" from suspicious in logs. 4) Is there any way to preventing attacks in Apache? Thanks for your help.

Read the article

Likeliness of obtaining same IP address after restarting a router

- by ?affael

My actual objective is to simulate logged IPs of web-site users who are all assumed to use dynamically assigned IPs. There will be two kinds of users: good users who only change IP when the ISP assignes a new one bad users who will restart their router to obtain a new IP So what I would like to understand is what assignment mechanics are usually at work here deciding from what pool of IPs one is chosen and whether the probability is uniformly distributed. I know there is no definite and global answer as this process can be adjusted be the ISP but maybe there is something like a technological frame and common process that allows some plausible assumptions. UPDATE: A bad user will restart the router as often as possible if necessary. So here the central question is how many IP changes on average are necessary to end up with a previously used IP.

Read the article

How to check the OS is running on bare metal and not in virtualized environment created by BIOS?

- by Arkadi Shishlov

Is there any software available as a Linux, *BSD, or Windows program or boot-image to check (or guess with good probability) the environment an operating system is loaded onto is genuine bare metal and not already virtualized? Given recent information from various sources, including supposed to be E.Snowden leaks, I'm curious about the security of my PC-s, even about those that don't have on-board BMC. How it could be possible and why? See for example Blue Pill, and a number of papers. With a little assistance from network card firmware, which is also loadable on popular card models, such hypervisor could easily spy on me resulting in PGP, Tor, etc. exercises futile.

Read the article

How will people upgrade from 12.10 to 14.04 after 13.04 is EOL?

- by Dave Jones

Looking at https://wiki.ubuntu.com/Releases 13.04 will reach EOL in January 2014, while 12.10 will reach EOL in April 2014, therefore if a 12.10 user hasn't upgraded to 13.04 and subsequently to 13.10, there will be a 3 month period where a 12.10 user has a supported version of Ubuntu, but will be unable to upgrade. I asked this question a number of months ago and the suggestion was that the hope was that there would be an upgrade path from 12.10 to 14.04. Could somebody confirm whether this is still the case, or if not what the plans are for 12.10 users after 13.04 becomes EOL. Edited for clarification The particular issue I was concerned about is that once 13.04 goes EOL, a 12.10 user would in theory lose the ability to upgrade once the 13.04 repo's are removed from the normal release repository. Using the old releases method would be a way around the issue, however would make it more complicated for a less experienced user. An alternative could be for the 13.04 repo's to be left available for the 3 month interim period so that a 12.10 version could still be upgraded to 13.04 and subsequently onto 13.10, however that doesn't seem an optimal solution in that users may consider that it meant that support for 13.04 was being continued. If a direct upgrade from 12.10 to 14.04 was to made available, this would only be available once 14.04 was released and still leaves the issue of the 3 months between January and April 2014 were there may be some confusion. I suspect that its not going to affect a significant number of users, if somebody has upgraded from 12.04LTS to 12.10, in all probability, they'll have continued to upgrade to 13.04 and upwards because they'd made the choice to use current rather than LTS releases. It would just be useful to have some clarification of the situation which people can be referred to in advance of 13.04 going EOL rather than hitting the cut off point and it being too late for users to make the decision and being left in limbo.

Read the article

book about psychology of decision and psychology of human

- by boos

I'm a unix developer and i want to make career in project/people management as first step. I think sometimes is better to have good communication skill and in general more human skill to make career more fast. Almost in Italy, a lot of people made career development more fast for his human skill and not for his technical skill. Anyone have read some book about psychology to better manage how people and personality work and to exploit decision making situation in the right way? I have found some interesting book about people personality and psychology of decision, but i am in doubt about the usefulness about reading such book. anyone have some experience in this path ? Anyone have found useful to read similar book about how people work, to manage career development in a more fast way and handle people and decision in a more useful way? i have already read peopleware. The table of content of one of this book have: 1 - Judicment and decision 2 - Euristics and sistematics error 3 - Estimating probability and frequency prediction 4 - Risk and decision 5 - rappresentation and decision 6 - Memory, attention and decision. Etc. what do you think about ?

Read the article

apt sources.list disabled on upgrade to 12.04

- by user101089

After a do-release-upgrade, I'm now running ubuntu 12.04 LTS, as indicated below > lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 12.04 LTS Release: 12.04 Codename: precise However, I find that all the entries in my /etc/apt/sources.list were commented out except for one. QUESTION: Is it safe for me to edit these, replacing the old 'lucid' with 'precise' in what is shown below? ## unixteam source list # deb http://debian.yorku.ca/ubuntu/ precise main main/debian-installer restricted restricted/debian-installer # disabled on upgrade to precise # deb-src http://debian.yorku.ca/ubuntu/ precise main restricted # disabled on upgrade to precise # deb http://debian.yorku.ca/ubuntu/ lucid-updates main restricted # disabled on upgrade to precise # deb-src http://debian.yorku.ca/ubuntu/ lucid-updates main restricted # disabled on upgrade to precise # deb http://debian.yorku.ca/ubuntu/ precise universe # disabled on upgrade to precise # deb-src http://debian.yorku.ca/ubuntu/ precise universe # disabled on upgrade to precise # deb http://debian.yorku.ca/ubuntu/ precise multiverse # disabled on upgrade to precise # deb-src http://debian.yorku.ca/ubuntu/ precise multiverse # disabled on upgrade to precise # deb http://debian.yorku.ca/ubuntu lucid-security main restricted # disabled on upgrade to precise # deb-src http://debian.yorku.ca/ubuntu lucid-security main restricted # disabled on upgrade to precise # deb http://debian.yorku.ca/ubuntu lucid-security universe # disabled on upgrade to precise # deb-src http://debian.yorku.ca/ubuntu lucid-security universe # disabled on upgrade to precise # deb http://debian.yorku.ca/ubuntu lucid-security multiverse # disabled on upgrade to precise # deb-src http://debian.yorku.ca/ubuntu lucid-security multiverse # disabled on upgrade to precise # R sources # see http://cran.us.r-project.org/bin/linux/ubuntu/ for details # deb http://probability.ca/cran/bin/linux/ubuntu lucid/ # disabled on upgrade to precise deb http://archive.ubuntu.com/ubuntu precise main multiverse universe

Read the article

Planning for the Recovery

- by john.orourke(at)oracle.com

As we plan for 2011, there are many positive signs in the global economy, but also some lingering issues. Planning no longer is about extrapolating past performance and adjusting for growth. It is now about constantly testing the temperature of the water, formulating scenarios, assessing risk and assigning probabilities. So how does one plan for recovery and improve forecast accuracy in such a volatile environment? Here are some suggestions from a recent article I wrote, which was published in the December Financial Planning & Analysis (FP&A) newsletter from the AFP (Association of Financial Professionals): Increase the frequency of forecasting Get more line managers involved in the planning and forecasting process Re-consider what's being measured - i.e. key financial and operational metrics Incorporate risk and probability into forecasts Reduce reliance on spreadsheets - leverage packaged EPM applications To learn more about these best practices, check out the FP&A section of the AFP website and register to receive the FP&A newsletter. AFP recently launched a new topic area focused on the FP&A function and items of interest to this group of finance professionals. In addition to the FP&A quarterly newsletter, AFP will be publishing articles, running webinars and will have an FP&A track in their annual conference, which is in Boston next November. Brian Kalish, AFP's Finance Lead, is hoping this initiative creates a valuable networking and information-sharing resource for FP&A professionals. Here's a link to the FP&A page on the AFP web site: http://www.afponline.org/pub/res/topics/topics_fpa.html If you register on the site you can access and subscribe to the FP&A newsletter and other resources. Best of luck in your planning for 2011 and beyond!

Read the article

Final Man vs. Machine Round of Jeopardy Unfolds; Watson Dominates

- by ETC

The final round of IBM’s Watson against Ken Jenning and Brad Rutter ended last night with Watson coming out in a strong lead against its two human opponents. Read on to catch a video of the match and see just how quick Watson is on the draw. Watson tore through many of the answers, the little probability bar at the bottom of the screen denoting it was often 95%+ confident in its answers. Some of the more interesting stumbles were, like in the last matches, based on nuance. By far the biggest “What?” moment of the night, however, was when it answered the Daily Double question of “The New Yorker’s 1959 review of this said in its brevity and clarity, it is ‘unlike most such manuals, a book as well as a tool’”. Watson, inexplicably, answered “Dorothy Parker”. You can win them all, eh? Check out the video below to see Watson in action on its final day. Latest Features How-To Geek ETC How to Enable User-Specific Wireless Networks in Windows 7 How to Use Google Chrome as Your Default PDF Reader (the Easy Way) How To Remove People and Objects From Photographs In Photoshop Ask How-To Geek: How Can I Monitor My Bandwidth Usage? Internet Explorer 9 RC Now Available: Here’s the Most Interesting New Stuff Here’s a Super Simple Trick to Defeating Fake Anti-Virus Malware The Citroen GT – An Awesome Video Game Car Brought to Life [Video] Final Man vs. Machine Round of Jeopardy Unfolds; Watson Dominates Give Chromium-Based Browser Desktop Notifications a Native System Look in Ubuntu Chrome Time Track Is a Simple Task Time Tracker Google Sky Map Turns Your Android Phone into a Digital Telescope Walking Through a Seaside Village Wallpaper

Read the article

How to implement an offline reader writer lock

- by Peter Morris

Some context for the question All objects in this question are persistent. All requests will be from a Silverlight client talking to an app server via a binary protocol (Hessian) and not WCF. Each user will have a session key (not an ASP.NET session) which will be a string, integer, or GUID (undecided so far). Some objects might take a long time to edit (30 or more minutes) so we have decided to use pessimistic offline locking. Pessimistic because having to reconcile conflicts would be far too annoying for users, offline because the client is not permanently connected to the server. Rather than storing session/object locking information in the object itself I have decided that any aggregate root that may have its instances locked should implement an interface ILockable public interface ILockable { Guid LockID { get; } } This LockID will be the identity of a "Lock" object which holds the information of which session is locking it. Now, if this were simple pessimistic locking I'd be able to achieve this very simply (using an incrementing version number on Lock to identify update conflicts), but what I actually need is ReaderWriter pessimistic offline locking. The reason is that some parts of the application will perform actions that read these complex structures. These include things like Reading a single structure to clone it. Reading multiple structures in order to create a binary file to "publish" the data to an external source. Read locks will be held for a very short period of time, typically less than a second, although in some circumstances they could be held for about 5 seconds at a guess. Write locks will mostly be held for a long time as they are mostly held by humans. There is a high probability of two users trying to edit the same aggregate at the same time, and a high probability of many users needing to temporarily read-lock at the same time too. I'm looking for suggestions as to how I might implement this. One additional point to make is that if I want to place a write lock and there are some read locks, I would like to "queue" the write lock so that no new read locks are placed. If the read locks are removed withing X seconds then the write lock is obtained, if not then the write lock backs off; no new read-locks would be placed while a write lock is queued. So far I have this idea The Lock object will have a version number (int) so I can detect multi-update conflicts, reload, try again. It will have a string[] for read locks A string to hold the session ID that has a write lock A string to hold the queued write lock Possibly a recursion counter to allow the same session to lock multiple times (for both read and write locks), but not sure about this yet. Rules: Can't place a read lock if there is a write lock or queued write lock. Can't place a write lock if there is a write lock or queued write lock. If there are no locks at all then a write lock may be placed. If there are read locks then a write lock will be queued instead of a full write lock placed. (If after X time the read locks are not gone the lock backs off, otherwise it is upgraded). Can't queue a write lock for a session that has a read lock. Can anyone see any problems? Suggest alternatives? Anything? I'd appreciate feedback before deciding on what approach to take.

Read the article

Fraud Detection with the SQL Server Suite Part 1

- by Dejan Sarka

While working on different fraud detection projects, I developed my own approach to the solution for this problem. In my PASS Summit 2013 session I am introducing this approach. I also wrote a whitepaper on the same topic, which was generously reviewed by my friend Matija Lah. In order to spread this knowledge faster, I am starting a series of blog posts which will at the end make the whole whitepaper. Abstract With the massive usage of credit cards and web applications for banking and payment processing, the number of fraudulent transactions is growing rapidly and on a global scale. Several fraud detection algorithms are available within a variety of different products. In this paper, we focus on using the Microsoft SQL Server suite for this purpose. In addition, we will explain our original approach to solving the problem by introducing a continuous learning procedure. Our preferred type of service is mentoring; it allows us to perform the work and consulting together with transferring the knowledge onto the customer, thus making it possible for a customer to continue to learn independently. This paper is based on practical experience with different projects covering online banking and credit card usage. Introduction A fraud is a criminal or deceptive activity with the intention of achieving financial or some other gain. Fraud can appear in multiple business areas. You can find a detailed overview of the business domains where fraud can take place in Sahin Y., & Duman E. (2011), Detecting Credit Card Fraud by Decision Trees and Support Vector Machines, Proceedings of the International MultiConference of Engineers and Computer Scientists 2011 Vol 1. Hong Kong: IMECS. Dealing with frauds includes fraud prevention and fraud detection. Fraud prevention is a proactive mechanism, which tries to disable frauds by using previous knowledge. Fraud detection is a reactive mechanism with the goal of detecting suspicious behavior when a fraudster surpasses the fraud prevention mechanism. A fraud detection mechanism checks every transaction and assigns a weight in terms of probability between 0 and 1 that represents a score for evaluating whether a transaction is fraudulent or not. A fraud detection mechanism cannot detect frauds with a probability of 100%; therefore, manual transaction checking must also be available. With fraud detection, this manual part can focus on the most suspicious transactions. This way, an unchanged number of supervisors can detect significantly more frauds than could be achieved with traditional methods of selecting which transactions to check, for example with random sampling. There are two principal data mining techniques available both in general data mining as well as in specific fraud detection techniques: supervised or directed and unsupervised or undirected. Supervised techniques or data mining models use previous knowledge. Typically, existing transactions are marked with a flag denoting whether a particular transaction is fraudulent or not. Customers at some point in time do report frauds, and the transactional system should be capable of accepting such a flag. Supervised data mining algorithms try to explain the value of this flag by using different input variables. When the patterns and rules that lead to frauds are learned through the model training process, they can be used for prediction of the fraud flag on new incoming transactions. Unsupervised techniques analyze data without prior knowledge, without the fraud flag; they try to find transactions which do not resemble other transactions, i.e. outliers. In both cases, there should be more frauds in the data set selected for checking by using the data mining knowledge compared to selecting the data set with simpler methods; this is known as the lift of a model. Typically, we compare the lift with random sampling. The supervised methods typically give a much better lift than the unsupervised ones. However, we must use the unsupervised ones when we do not have any previous knowledge. Furthermore, unsupervised methods are useful for controlling whether the supervised models are still efficient. Accuracy of the predictions drops over time. Patterns of credit card usage, for example, change over time. In addition, fraudsters continuously learn as well. Therefore, it is important to check the efficiency of the predictive models with the undirected ones. When the difference between the lift of the supervised models and the lift of the unsupervised models drops, it is time to refine the supervised models. However, the unsupervised models can become obsolete as well. It is also important to measure the overall efficiency of both, supervised and unsupervised models, over time. We can compare the number of predicted frauds with the total number of frauds that include predicted and reported occurrences. For measuring behavior across time, specific analytical databases called data warehouses (DW) and on-line analytical processing (OLAP) systems can be employed. By controlling the supervised models with unsupervised ones and by using an OLAP system or DW reports to control both, a continuous learning infrastructure can be established. There are many difficulties in developing a fraud detection system. As has already been mentioned, fraudsters continuously learn, and the patterns change. The exchange of experiences and ideas can be very limited due to privacy concerns. In addition, both data sets and results might be censored, as the companies generally do not want to publically expose actual fraudulent behaviors. Therefore it can be quite difficult if not impossible to cross-evaluate the models using data from different companies and different business areas. This fact stresses the importance of continuous learning even more. Finally, the number of frauds in the total number of transactions is small, typically much less than 1% of transactions is fraudulent. Some predictive data mining algorithms do not give good results when the target state is represented with a very low frequency. Data preparation techniques like oversampling and undersampling can help overcome the shortcomings of many algorithms. SQL Server suite includes all of the software required to create, deploy any maintain a fraud detection infrastructure. The Database Engine is the relational database management system (RDBMS), which supports all activity needed for data preparation and for data warehouses. SQL Server Analysis Services (SSAS) supports OLAP and data mining (in version 2012, you need to install SSAS in multidimensional and data mining mode; this was the only mode in previous versions of SSAS, while SSAS 2012 also supports the tabular mode, which does not include data mining). Additional products from the suite can be useful as well. SQL Server Integration Services (SSIS) is a tool for developing extract transform–load (ETL) applications. SSIS is typically used for loading a DW, and in addition, it can use SSAS data mining models for building intelligent data flows. SQL Server Reporting Services (SSRS) is useful for presenting the results in a variety of reports. Data Quality Services (DQS) mitigate the occasional data cleansing process by maintaining a knowledge base. Master Data Services is an application that helps companies maintaining a central, authoritative source of their master data, i.e. the most important data to any organization. For an overview of the SQL Server business intelligence (BI) part of the suite that includes Database Engine, SSAS and SSRS, please refer to Veerman E., Lachev T., & Sarka D. (2009). MCTS Self-Paced Training Kit (Exam 70-448): Microsoft® SQL Server® 2008 Business Intelligence Development and Maintenance. MS Press. For an overview of the enterprise information management (EIM) part that includes SSIS, DQS and MDS, please refer to Sarka D., Lah M., & Jerkic G. (2012). Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft® SQL Server® 2012. O'Reilly. For details about SSAS data mining, please refer to MacLennan J., Tang Z., & Crivat B. (2009). Data Mining with Microsoft SQL Server 2008. Wiley. SQL Server Data Mining Add-ins for Office, a free download for Office versions 2007, 2010 and 2013, bring the power of data mining to Excel, enabling advanced analytics in Excel. Together with PowerPivot for Excel, which is also freely downloadable and can be used in Excel 2010, is already included in Excel 2013. It brings OLAP functionalities directly into Excel, making it possible for an advanced analyst to build a complete learning infrastructure using a familiar tool. This way, many more people, including employees in subsidiaries, can contribute to the learning process by examining local transactions and quickly identifying new patterns.

Read the article

How advanced are author-recognition methods?

- by Nick Rtz

From a written text by an author if a computer program analyses the text, how much can a computer program tell today about the author of some (long enough to be statistically significant) texts? Can the computer program even tell with "certainty" whether a man or a woman wrote this text based solely on the contents of the text and not an investigation such as ip numbers etc? I'm interested to know if there are algorithms in use for instance to automatically know whether an author was male or female or similar characteristics of an author that a computer program can decide based on analyses of the written text by an author. It could be useful to know before you read a message what a computer analyses says about the author, do you agree? If I for instance get a longer message from my wife that she has had an accident in Nigeria and the computer program says that with 99 % probability the message was written by a male author in his sixties of non-caucasian origin or likewise, or by somebody who is not my wife, then the computer program could help me investigate why a certain message differs in characteristics. There can also be other uses for instance just detecting outliers in a geographically or demographically bounded larger data set. Scam detection is the obvious use I'm thinking of but there could also be other uses. Are there already such programs that analyse a written text to tell something about the author based on word choice, use of pronouns, unusual language usage, or likewise?

Read the article

SEM & Adwords: How many click without a sale before i should pause a keyword

- by Thomas Jönsson

I wonder how many clicks I optimally should let pass through every new keyword I try in Adwords before I find out that it's not making a profit and it should be paused! It's actually four question. 1: At which likelihood percentile should I pause a word? 2: How many clicks should I let through before I pause a word for those word which do not generate any lead? 3: How many clicks should I let through after one sale to consider the word not to be profitable? 4: Does the likelihood of the word becoming profitable affect the above? Conditions: -The clicks is normally distributed. (correct?) -A CR of 1% is break even, everything above is profit (1 sale/100 clicks=break even) Cost per Click(cpc) = 4$ -Marginal (profit per sale) = 400$ -Paybacktime = 1 year -Average click per word = 0,333 per day (121 + 2/3 per year) Exampel: After 1 click and no sale the keyword still has a high probability to be profitable. After 500 clicks and no sale it has almost no likelihood to not be profitable and should probably be paused. Thanks in advance!

Read the article

Issue in understanding how to compare performance of classifier using ROC

- by user1214586

I am trying to demystify pattern recognition techniques and understood few of them. I am trying to design a classifier M. A gesture is classified based on the hamming distance between the sample time series y and the training time series x. The result of the classifier are probabilistic values. There are 3 classes/categories with labels A,B,C which classifies hand gestures where there are 100 samples for each class which are to be classified (single feature and data length=100). The data are different time series (x coordinate vs time). The training set is used to assign probabilities indicating which gesture has occured how many times. So,out of 10 training samples if gesture A appeared 6 times then probability that a gesture falls under category A is P(A)=0.6 similarly P(B)=0.3 and P(C)=0.1 Now, I am trying to compare the performance of this classifier with Bayes classifier, K-NN, Principal component analysis (PCA) and Neural Network. On what basis,parameter and method should I do it if I consider ROC or cross validate since the features for my classifier are the probabilistic values for the ROC plot hence what shall be the features for k-nn,bayes classification and PCA? Is there a code for it which will be useful. What should be the value of k is there are 3 classes of gestures? Please help. I am in a fix.

Read the article

Evidence-Based-Scheduling - are estimations only as accurate as the work-plan they're based on?

- by Assaf Lavie

I've been using FogBugz's Evidence Based Scheduling (for the uninitiated, Joel explains) for a while now and there's an inherent problem I can't seem to work around. The system is good at telling me the probability that a given project will be delivered at some date, given the detailed list of tasks that comprise the project. However, it does not take into account the fact that during development additional tasks always pop up. Now, there's the garbage-can approach of creating a generic task/scheduled-item for "last minute hacks" or "integration tasks", or what have you, but that clearly goes against the idea of aggregating the estimates of many small cases. It's often the case that during the development stage of a project you realize that there's a whole area your planning didn't cover, because, well, that's the nature of developing stuff that hasn't been developed before. So now your ~3 month project may very well turn into a 6 month project, but not because your estimations were off (you could be the best estimator in the world, for those task the comprised your initial work plan); rather because you ended up adding a whole bunch of new tasks that weren't there to begin with. EBS doesn't help you with that. It could, theoretically (I guess). It could, perhaps, measure the amount of work you add to a project over time and take that into consideration when estimating the time remaining on a given project. Just a thought. In other words, EBS works on a task basis, but not on a project/release basis - but the latter is what's important. It's what your boss typically cares about - delivery date, not the time it takes to finish each task along the way, and not the time it would have taken, if your planning was perfect. So the question is (yes, there's a question here, don't close it): What's your methodology when it comes to using EBS in FogBugz and how do you solve the problem above, which seems to be a main cause of schedule delays and mispredictions? Edit Some more thoughts after reading a few answers: If it comes down to having to choose which delivery date you're comfortable presenting to your higher-ups by squinting at the delivery-probability graph and choosing 80%, or 95%, or 60% (based on what, exactly?) then we've resorted to plain old buffering/factoring of our estimates. In which case, couldn't we have skipped the meticulous case by case hour-sized estimation effort step? By forcing ourselves to break down tasks that take more than a day into smaller chunks of work haven't we just deluded ourselves into thinking our planning is as tight and thorough as it could be? People may be consistently bad estimators that do not even learn from their past mistakes. In that respect, having an EBS system is certainly better than not having one. But what can we do about the fact that we're not that good in planning as well? I'm not sure it's a problem that can be solved by a similar system. Our estimates are wrong because of tendencies to be overly optimistic/pessimistic about certain tasks, and because of neglect to account for systematic delays (e.g. sick days, major bug crisis) - and usually not because we lack knowledge about the work that needs to be done. Our planning, on the other hand, is often incomplete because we simply don't have enough knowledge in this early stage; and I don't see how an EBS-like system could fill that gap. So we're back to methodology. We need to find a way to accommodate bad or incomplete work plans that's better than voodoo-multiplication.

Read the article

Find points whose pairwise distances approximate a given distance matrix

- by Stephan Kolassa

Problem. I have a symmetric distance matrix with entries between zero and one, like this one: D = ( 0.0 0.4 0.0 0.5 ) ( 0.4 0.0 0.2 1.0 ) ( 0.0 0.2 0.0 0.7 ) ( 0.5 1.0 0.7 0.0 ) I would like to find points in the plane that have (approximately) the pairwise distances given in D. I understand that this will usually not be possible with strictly correct distances, so I would be happy with a "good" approximation. My matrices are smallish, no more than 10x10, so performance is not an issue. Question. Does anyone know of an algorithm to do this? Background. I have sets of probability densities between which I calculate Hellinger distances, which I would like to visualize as above. Each set contains no more than 10 densities (see above), but I have a couple of hundred sets. What I did so far. I did consider posting at math.SE, but looking at what gets tagged as "geometry" there, it seems like this kind of computational geometry question would be more on-topic here. If the community thinks this should be migrated, please go ahead. This looks like a straightforward problem in computational geometry, and I would assume that anyone involved in clustering might be interested in such a visualization, but I haven't been able to google anything. One simple approach would be to randomly plonk down points and perturb them until the distance matrix is close to D, e.g., using Simulated Annealing, or run a Genetic Algorithm. I have to admit that I haven't tried that yet, hoping for a smarter way. One specific operationalization of a "good" approximation in the sense above is Problem 4 in the Open Problems section here, with k=2. Now, while finding an algorithm that is guaranteed to find the minimum l1-distance between D and the resulting distance matrix may be an open question, it still seems possible that there at least is some approximation to this optimal solution. If I don't get an answer here, I'll mail the gentleman who posed that problem and ask whether he knows of any approximation algorithm (and post any answer I get to that here).

Read the article

Pre game loading time vs. in game loading time

- by Keeper

I'm developing a game in which a random maze is included. There are some AI creatures, lurking the maze. And I want them to go in some path according to the mazes shape. Now there are two possibilities for me to implement that, the first way (which I used) is by calculating several wanted lurking paths once the maze is created. The second, is by calculating a path once needed to be calculated, when a creature starts lurking it. My main concern is loading times. If I calculate many paths at the creating of the maze, the pre loading time is a bit long, so I thought about calculating them when needed. At the moment the game is not 'heavy' so calculating paths in mid game is not noticeable, but I'm afraid it will once it will get more complicated. Any suggestions, comments, opinions, will be of help. Edit: As for now, let p be the number of pre-calculated paths, a creatures has the probability of 1/p to take a new path (which means a path calculation) instead of an existing one. A creature does not start its patrol until the path is fully calculated of course, so no need to worry about him getting killed in the process.

Read the article

How to Get a Smartphone-Style Word Suggestion on Windows

- by Zainul Franciscus

Have you ever wished that you can type faster and better in Windows ? Then you’re in luck, because today we’ll show you how to get a smartphone’s word suggestion in Windows. To accomplish that, you need to install AI Type, a software that gives word suggestion when you write in Windows. AI Type not only fulfils our gratification to have a smartphone-style word suggestion for Windows, AI Type also improves our writings by suggesting word according to its context. It will also try to match words according to the probability in which other users may have used it. Installing AI Type is a breeze; Just download the installer from AI Type website, run the executable, fill in a registration form, and you’re all set to use AI Type for your daily writing. Once you’re done with the installation, AI Type appears on your system tray. Latest Features How-To Geek ETC Macs Don’t Make You Creative! So Why Do Artists Really Love Apple? MacX DVD Ripper Pro is Free for How-To Geek Readers (Time Limited!) HTG Explains: What’s a Solid State Drive and What Do I Need to Know? How to Get Amazing Color from Photos in Photoshop, GIMP, and Paint.NET Learn To Adjust Contrast Like a Pro in Photoshop, GIMP, and Paint.NET Have You Ever Wondered How Your Operating System Got Its Name? Sync Blocker Stops iTunes from Automatically Syncing The Journey to the Mystical Forest [Wallpaper] Trace Your Browser’s Roots on the Browser Family Tree [Infographic] Save Files Directly from Your Browser to the Cloud in Chrome and Iron The Steve Jobs Chronicles – Charlie and the Apple Factory [Video] Google Chrome Updates; Faster, Cleaner Menus, Encrypted Password Syncing, and More

Read the article

Rule of thumb for cost vs. savings for code re-use

- by Styler

Is it a good rule of thumb to always write code for the intent of re-using it somewhere down the road? Or, depending on the size of the component you are writing, is it better practice to design it for re-use when it makes sense with regards to time spent on it. What is a good rule of thumb for spending extra time on analysis and design on project components that have "some probability" of being needed later down the road for other things that may or may need this part. For example, if I have the need for project X to do things A, and B. A definitely needs to be written for re-use because it just makes sense to do so. B is very project specific at the moment, and I can hack it all together in a couple days to finish the project on time and give everyone kudos for being a great team, etc. Or if we say, lets spend a whole friggin' 2 weeks figuring out what project Y/Z might need this thing for and spend a load of extra time on on part B because someday we might need to use it on project Y/Z (where the savings will be realized). I'd imagine a perfect world situation would be a nicely crafted combination of project specific vs. re-use architected components given the project. However some code shops might feel it would be a great idea to write everything for the intention of using it at some point down the road.

Search Results

Search found 1745 results on 70 pages for 'probability theory'.

Page 27/70 | < Previous Page | 23 24 25 26 27 28 29 30 31 32 33 34 | Next Page >

- by Eric

- by Adrian Grigore

- by morpheous

- by Reid

- by Benedikt Eger

- by TigerInCanada

- by RushPL

- by Vasiliy Toporov

- by ?affael

- by Arkadi Shishlov

- by Dave Jones

- by boos

- by user101089

- by john.orourke(at)oracle.com

- by ETC

- by Peter Morris

- by Dejan Sarka

- by Nick Rtz

- by Thomas Jönsson

- by user1214586

- by Assaf Lavie

- by Stephan Kolassa

- by Keeper

- by Zainul Franciscus

- by Styler

< Previous Page | 23 24 25 26 27 28 29 30 31 32 33 34 | Next Page >