statistics - Page 11 - Developer IT

Information on 50% drop in spam starting Christmas 2010

- by George Bailey

Has anybody who administers email servers or spam filtering noticed that in last couple of weeks the spam volume has dropped significantly? Is there a chart provided by one of the major spam filtering companies? Edit: Based on our internal stats, although it varies, on the two weeks starting the day after Christmas (Sunday), spam seems to be coming in about half as much as it did before Christmas.

Read the article

Firefox add-on that tracks webpage load time including DNS retrieve time?

- by Jonathan

I used to have this add-on but now I forgot what it's called. Basically it can track any website's load time, DNS seeking time, time from host to javascript execution and HTML rendering, things like that. Got any suggestions for such an add-on for Firefox? Thanks.

Read the article

How often do netbook parts break?

- by kurresmack

I need to know a list of how many percentage of all components break within a year. For example I would need to know how many percent of all netbook RAM is calculated to break within a year? This is a lot to ask, I know. But I really do need to know some facts on what to except to break when you have a lot of netbooks. Would be glad if someone had hard facts that could be backed up with resources. Only netbooks are considered.

Read the article

Lightweight alternative to R for RHEL?

- by Eric Rath

I want to use R for some statistical analysis of logfile information, but found that even the "limited" R-core RPM has a lot of dependencies not already installed. I don't want to install so many packages for a peripheral need. Are there lightweight alternatives for simple statistical analysis on RHEL 6? I have an R script that accepts on stdin a large set of values -- one value per line -- and prints out the min, max, mean, median, 95th percentile, and standard deviation. For more context, I'm using grep and awk to find GET requests for a particular path in our webserver log files, get the response times, and calculate the metrics listed above in order to measure the impact on performance of changes to a web application. I don't need any graphing capabilities, just simple computation. Is there something I've overlooked?

Read the article

Statistical approach to chess?

- by Chinmay Kanchi

Reading about how Google solves the translation problem got me thinking. Would it be possible to build a strong chess engine by analysing several million games and determining the best possible move based largely (completely?) on statistics? There are several such chess databases (this is one that has 4.5 million games), and one could potentially weight moves in identical (or mirrored or reflected) positions using factors such as the ratings of the players involved, how old the game is (to factor in improvements in chess theory) etc. Any reasons why this wouldn't be a feasible approach to building a chess engine?

Read the article

Are there any open-source / free site analytics solutions that are intranet deployable?

- by Richard Nichols

There are plenty of statistics/analytics providers for Internet deployed software (e.g. Google Analytics), but I'm looking for an analytics tool to integrate into a LAN/intranet based web application. I'm aware of AWStats, but I'd prefer something with a design similar to Google Analytics, where a Javascript callback can be embedded into the app and call back to an analytics server. This doesn't require any sort of extra application server configuration and access to run. I'm thinking there's nothing available that isn't proprietary / pay-for, but I'd love to be told I'm wrong!

Read the article

Better algorithm for estimating download time

- by Scott Smith

We've all seen the download time running estimate that initially says something like "7 days", but keeps dropping wildly (e.g. "23 hours", "45 minutes", "1 min. 50 sec", etc) with each successive estimation as the chunks are downloaded. To avoid these initial (alarming) estimates, there are techniques one could try like suppressing display of the first n estimates, or waiting for the delta between estimates to drop below some threshold before you start displaying them, but these don't seem like a general, robust solution. There are corner cases involving too few samples, or samples that actually are wildly varying... I think I recall a general solution for this kind of thing in mathematics (statistics?) that reduced or eliminated these wild values. Does anyone know?

Read the article

If I take a large datatype. Will it affect performance in sql server

- by Shantanu Gupta

If i takes larger datatype where i know i should have taken datatype that was sufficient for possible values that i will insert into a table will affect any performance in sql server in terms of speed or any other way. eg. IsActive (0,1,2,3) not more than 3 in any case. I know i must take tinyint but due to some reasons consider it as compulsion, i am taking every numeric field as bigint and every character field as nVarchar(Max) Please give statistics if possible, to let me try to overcoming that compulsion. I need some solid analysis that can really make someone rethink before taking any datatype.

Read the article

How do I add values in an array when there is a null entry?

- by Angela

I want to create a real time-series array. Currently, I am using the statistics gem to pull out values for each 'day': define_statistic :sent_count, :count => :all, :group => 'DATE(date_sent)', :filter_on => {:email_id => 'email_id > = ?'}, :order => 'DATE(date_sent) ASC' What this does is create an array where there are values for a date, for example [["12-20-2010",1], ["12-24-2010",3]] But I need it to fill in the null values, so it looks more like: [["12-20-2010",1], ["12-21-2010",0], ["12-22-2010",0], ["12-23-2010",0], ["12-24-2010",3]] Notice how the second example has "0" values for the days that were missing from the first array.

Read the article

Proper Translation of equation to C#

- by Shykin

I am trying to replicate this equation: Slope(b) = (NSXY - (SX)(SY)) / (NSX2 - (SX)2) in C# but I'm getting the following issue: If I make the average of X = 1 + 2 + 3 + 4 + 5 and the average Y = 5 + 4 + 3 + 2 + 1 it gives me a positive slope even though it is clearly counting down. If I place the same numbers into this calculator: http://www.easycalculation.com/statistics/regression.php It gives me a negative slope in the linked calculator with the same data. I'm trying to narrow down the reasons so is the following a proper translation from equation to C# code: Slope(b) = (NSXY - (SX)(SY)) / (NSX2 - (SX)2) to Slope (m) = ((x * avgX * avgY) - (avgX * avgY)) / ((x * Math.Pow(avgX, 2)) - Math.Pow(avgX, 2));

Read the article

HTG Explains: Why Do So Many Apps Want to Send Usage Statistics, and Should I Let Them?

- by Chris Hoffman

Many programs want to send usage statistics, error logs, and crash reports — data about how you use the application and what problems occurred — to their servers. Some people disable these options, but should you? We’ll look at the exact types of data applications want to send, what developers do with it, whether any sensitive personal data is being passed along, and the advantages and disadvantages to enabling these options.

Read the article

What statistics app should I use for my website?

- by Camran

I have my own server (with root access). I need statistics of users who visit my website etc etc... I have looked at an app called Webalyzer... Is this a good choice? I run apache2 on a Ubuntu 9 system... If you know of any good statistics apps for servers please let me know. And a follow-up question: All statistics are saved in log-files right? So how large would these log-files become then? Possibility to split them would be good, dont know if this is possible with Webalyzer though...

Read the article

Exponential regression : p-value and F significance

- by Saravanan K

I am new to statistics. I have a set of independent data and dependent data (X,Y), where I would like to do an exponential regression to obtain its p-value and significant F (already obtained R2 and also the coefficients through mathematical calculation). What is the natural evolution from the (X,Y) data to mathematically calculate those variables. Spent a week on the internet to study this but unable to find the right answer. Often an exponential data, y=be^(mx) will be converted first to a linear data, ln y = mx + ln b . Then a linear regression will done on the converted data, obtaining its p-value etc. Assume we use a statistical tool such as Excel's Analysis ToolPak: Data Analysis : Regression, it will produce a result such as below, I believe the p-value and Significant F value is representing the converted linear data and not the original exponential data. Questions: What is the approach/steps used by Excel to get the p-value and Significant F value for the converted linear data as shown in the statistic output in the image above? It is not clear in their help page or website. Can the p-value and Significant F could be mathematically calculated for exponential regression without using a statistical tool? Can you assist to point me to the right link if this has been answered before.

Read the article

How can I intelligently group rows of integers for a faceted search?

- by Alastair

I'm not even quite sure what terms I should be using for what I want, so any advice on what I'm even asking for would be very welcome. Basically, my web site lists user-generated accommodations. Each has a rent price, which users will be able to query in our new faceted search box. Users search by city, and within each city I'd like to present a different rent grouping. That is to say that in City #1, if we have listings ranging from $200 - $1000, I'd like to present checkboxes for: less than $300 $301 - $500 $501 - $700 more than $700 However, if City #2 has values that range from $500 - $1500, I want the ranges above to change accordingly. So, if I say that I want 5 or 6 range options in each city, I think I have two options: Take the min and max values and just split the difference. I don't like this idea because one listing with a rent of $10,000 will throw the whole scale off. Intelligently calculate the ranges using means, medians etc. Number 2 is what I need help with. I'm a web developer that gets logic, but was never strong on math and statistics at school. Can anyone point me towards a guide that'll help me figure this out?

Read the article

Survey statistic diagram ideas

- by Nort

Hey everyone, I've got some homework tasks in topic surveys and diagrams. The first task is to normalize the input of a survey, because the structure of the data is changing from time-to-time. So there are three types of surveys: static fields, where text is stored dynamic ones, where the user can select one option and multiselect fields, where the user can select multiple options So I'm not really a statistics guy, so I have really no idea what I can do with that incomming data. So the data I have is stored in an orbital XML file from there I can easily get how man times a survey was filled, and how many times a field was filled, so I can (for eg on a pie chart show the relation of filled or not filled). The second idea is to show the relation between the content of a multi option element using a bar chart or so. In case of the multi option elements I've got the idea to show data in implication of one option. But the question is, what could be shown? The other problem are the static elements (text fields and so). What data could be represented from a single field? The data in the XML field is collected from 2001 to 2005 So maybe I can work with the dates of the surveys, but as I said, i don't really know how to process the data, to collect as much data as possible.

Read the article

Modeling distribution of performance measurements

- by peterchen

How would you mathematically model the distribution of repeated real life performance measurements - "Real life" meaning you are not just looping over the code in question, but it is just a short snippet within a large application running in a typical user scenario? My experience shows that you usually have a peak around the average execution time that can be modeled adequately with a Gaussian distribution. In addition, there's a "long tail" containing outliers - often with a multiple of the average time. (The behavior is understandable considering the factors contributing to first execution penalty). My goal is to model aggregate values that reasonably reflect this, and can be calculated from aggregate values (like for the Gaussian, calculate mu and sigma from N, sum of values and sum of squares). In other terms, number of repetitions is unlimited, but memory and calculation requirements should be minimized. A normal Gaussian distribution can't model the long tail appropriately and will have the average biased strongly even by a very small percentage of outliers. I am looking for ideas, especially if this has been attempted/analysed before. I've checked various distributions models, and I think I could work out something, but my statistics is rusty and I might end up with an overblown solution. Oh, a complete shrink-wrapped solution would be fine, too ;) Other aspects / ideas: Sometimes you get "two humps" distributions, which would be acceptable in my scenario with a single mu/sigma covering both, but ideally would be identified separately. Extrapolating this, another approach would be a "floating probability density calculation" that uses only a limited buffer and adjusts automatically to the range (due to the long tail, bins may not be spaced evenly) - haven't found anything, but with some assumptions about the distribution it should be possible in principle. Why (since it was asked) - For a complex process we need to make guarantees such as "only 0.1% of runs exceed a limit of 3 seconds, and the average processing time is 2.8 seconds". The performance of an isolated piece of code can be very different from a normal run-time environment involving varying levels of disk and network access, background services, scheduled events that occur within a day, etc. This can be solved trivially by accumulating all data. However, to accumulate this data in production, the data produced needs to be limited. For analysis of isolated pieces of code, a gaussian deviation plus first run penalty is ok. That doesn't work anymore for the distributions found above. [edit] I've already got very good answers (and finally - maybe - some time to work on this). I'm starting a bounty to look for more input / ideas.

Read the article

Do I need to recreate statistics if I had to drop them to add a foreign key

- by Adam J.R. Erickson

I have a database which had all it's foreign-key relationships dropped at some unknown time in the past (don't ask). I have an old copy of the database which isn't good to restore from, but the schema has the relationships. I'm working from that to create a script to restore the keys. In updating the tables, I've had to drop statistics from several tables. Do I need to manually recreate those, or can I just run the statistics update procedure when all the tables are updated?

Read the article

How reliable is the battery data in ubuntu the power statistics?

- by nbubis

Right now the power statistics show that: Energy when full: 25.5 Wh Energy (design): 93.2 Wh And indeed the battery doesn't seem to be lasting as long as it used too. My question: Is this data reliable? Does it really indicate that I should replace the battery, or could it be the charger, laptop, or OS that is stopping the battery from fully charging? Is any way of validating the battery is indeed to blame? I'd like to be sure before shelling out 90$ for a new battery. (If it helps, the battery is a 3 year old dell 9 cell rated at 90 Wh).

Read the article

Are there statistics or time series of open bugs in Ubuntu?

- by aroque

I would like to know how the number of bugs in Ubuntu (open, closed, critical, etc) has evolved with time. It's a sort of scientific curiosity I have, but it would also give me a feeling how the community has changed over time, how it has coped with the challenges (I think of Unity in particular) and what's its status now. Has anyone collected these data over the years? If yes, are they publicly available? I know this information can be gathered from Launchpad itself and actually I found a website that had data from mid 2008 to early 2009. I found Ubuntu live stats, which shows live messages related to Ubuntu, but does not aggregate bug statistics. Finally there are some stats on the Ubuntu Weekly Newsletter but they only show diffs of bugs closed during the last week.

Read the article

Do you want to find out when statistics have been updated?

- by simonsabin

If you need to find out when statistics were last updated run the following select OBJECT_NAME ( s . object_id ) object , s . name , STATS_DATE ( s . object_id , s . stats_id ) StatsDate , s . auto_created , s . filter_definition , s . has_filter , s . no_recompute , s . user_created , stuff (( select ',' + col . name from sys . stats_columns sc join sys...(read more)

Read the article

Are there any Linear Regression Function in MS SQL Server?

- by pavanrao

Are there any Linear Regression Function in SQL Server 2005/2008, similar to the the Linear Regression functions in Oracle ?

Read the article

Graphing perpendicular offsets in a least squares regression plot in R

- by D W

I'm interested in making a plot with a least squares regression line and line segments connecting the datapoints to the regression line as illustrated here in the graphic called perpendicular offsets: http://mathworld.wolfram.com/LeastSquaresFitting.html I have the plot and regression line done here: ## Dataset from http://www.apsnet.org/education/advancedplantpath/topics/RModules/doc1/04_Linear_regression.html ## Disease severity as a function of temperature # Response variable, disease severity diseasesev<-c(1.9,3.1,3.3,4.8,5.3,6.1,6.4,7.6,9.8,12.4) # Predictor variable, (Centigrade) temperature<-c(2,1,5,5,20,20,23,10,30,25) ## Fit a linear model for the data and summarize the output from function lm() severity.lm <- lm(diseasesev~temperature,data=severity) # Take a look at the data plot( diseasesev~temperature, data=severity, xlab="Temperature", ylab="% Disease Severity", pch=16 ) abline(severity.lm,lty=1) title(main="Graph of % Disease Severity vs Temperature") Should I use some kind of for loop and segments http://www.iiap.res.in/astrostat/School07/R/html/graphics/html/segments.html to do the perpendicular offsets? Is there a more efficient way? Please provide an example if possible.

Read the article

Reordering matrix elements to reflect column and row clustering in naiive python

- by bgbg

Hello, I'm looking for a way to perform clustering separately on matrix rows and than on its columns, reorder the data in the matrix to reflect the clustering and putting it all together. The clustering problem is easily solvable, so is the dendrogram creation (for example in this blog or in "Programming collective intelligence"). However, how to reorder the data remains unclear for me. Eventually, I'm looking for a way of creating graphs similar to the one below using naive Python (with any "standard" library such as numpy, matplotlib etc, but without using R or other external tools).

Read the article

How do I determine a best-fit distribution in java?

- by Eadwacer

I have a bunch of sets of data (between 50 to 500 points, each of which can take a positive integral value) and need to determine which distribution best describes them. I have done this manually for several of them, but need to automate this going forward. Some of the sets are completely modal (every datum has the value of 15), some are strongly modal or bimodal, some are bell-curves (often skewed and with differing degrees of kertosis/pointiness), some are roughly flat, and there are any number of other possible distributions (possion, power-law, etc.). I need a way to determine which distribution best describes the data and (ideally) also provides me with a fitness metric so that I know how confident I am in the analysis. Existing open-source libraries would be ideal, followed by well documented algorithms that I can implement myself.

Read the article

naive bayesian spam filter question

- by Microkernel

Hi guys, I am planning to implement spam filter using Naive Bayesian classification model. Online I see a lot of info on Naive Bayesian classification, but the problem is its a lot of mathematical stuff, than clearly stating how its done. And the problem is I am more of a programmer than a mathematician (yes I had learnt Probability and Bayesian theorem back in school, but out of touch for a long long time, and I don't have luxury of learning it now (Have nearly 3 weeks to come-up with a working prototype)). So if someone can explain or point me to location where its explained for programmers than a mathematician, it would be a great help. PS: By the way I have to implement it in C, if you want to know. :( Regards, Microkernel

Search Results

Search found 1631 results on 66 pages for 'statistics'.

Page 11/66 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >

- by George Bailey

- by Jonathan

- by kurresmack

- by Eric Rath

- by Chinmay Kanchi

- by Richard Nichols

- by Scott Smith

- by Shantanu Gupta

- by Angela

- by Shykin

- by Chris Hoffman

- by Camran

- by Saravanan K

- by Alastair

- by Nort

- by peterchen

- by Adam J.R. Erickson

- by nbubis

- by aroque

- by simonsabin

- by pavanrao

- by D W

- by bgbg

- by Eadwacer

- by Microkernel

< Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >