trending and statistics - Page 13

Generation of an array with defined Min, Max, Mean and Stdev with given number of elements and error

- by Viet

I'd like to generate an array with defined Min, Max, Mean and Stdev with given number of elements and error level. Is there such a library in C, C++, PHP or Python to do so? Please kindly advise. Thanks!

Read the article

redirect user, then log his visit using php and mysql

- by Bart van Heukelom

I have a PHP redirect page to track clicks on links. Basically it does: - get url from $_GET - connect to database - create row for url, or update with 1 hit if it exists - redirect browser to url using Location: header I was wondering if it's possible to send the redirect to the client first, so it can get on with it's job, and then log the click. Would simply switching things around work?

Read the article

How is the iPad going to be classified - as a mobile platform or a desktop platform?

- by Tony Eichelberger

I sometimes use the following site to look at browser and OS trends http://gs.statcounter.com/. It got me thinking about how the iPad is going to be classified, as a mobile platform or a desktop platform, or is it going to spark a new category. Since it runs iPhone OS, it could be considered a mobile device, but I have a hard time with that because of the screen size. What should iPad be classified as: Mobile, Desktop, or Other (Try to come up with a good name for Other)?

Read the article

Multiple outliers for two variable linear regression

- by Dave Jarvis

Problem Building on my previous question, the "extreme" outliers in the following graph are somewhat obvious: Question Given: T - Set of all temperatures Y - Set of all years ST - Sum of temperatures. SY - Sum of years. N - Number of elements T(n) - Temperature of the nth element in the temperature set How would you implement an efficient MySQL stored procedure or user-defined function (UDF) to determine if T(n) is an outlier? (If such an implementation already exists, that would be good to know as well.) Related Sites I am slowly working through these sites to get a better understanding of the problem: Multiple Outliers Detection Procedures in Linear Regression M-estimator Measure of Surprise for Outlier Detection Ordinary Least Squares Linear Regression Many thanks!

Read the article

Probability distribution for sms answer delays

- by Thomas Ahle

I'm writing an app using sms as communication. I have chosen to subscribe to an sms-gateway, which provides me with an API for doing so. The API has functions for sending as well as pulling new messages. It does however not have any kind of push functionality. In order to do my queries most efficient, I'm seeking data on how long time people wait before they answer a text message - as a probability function. Extra info: The application is interactive (as can be), so I suppose the times will be pretty similar to real life human-human communication. I don't believe differences in personal style will play a big impact on the right times and frequencies to query, so average data should be fine.

Read the article

Beginner SQL question: arithmetic with multiple COUNT(*) results

- by polygenelubricants

Continuing with the spirit of using the Stack Exchange Data Explorer to learn SQL, (see: Can we become our own “Northwind” for teaching SQL / databases?), I've decided to try to write a query to answer a simple question (on meta): What % of stackoverflow users have over 10,000 rep?. Here's what I've done: Query#1 SELECT COUNT(*) FROM Users WHERE Users.Reputation >= 10000 Result: 556 Query#2 SELECT COUNT(*) FROM USERS Result: 227691 Now, how do I put them together into one query? What is this query idiom called? What do I need to write so I can get, say, a one-row three-column result like this: 556 227691 0,00244190592

Read the article

Regressing panel data in SAS.

- by John

Hey Guys, thanks to your help I succesfully managed all my databases! I am now looking at a panel data set on which I have to regress. Since I only started my Phd this semester together with the econometrics courses I am still new to many statistic applications and regression methods. I want to do a simple regression as in Y = x1 x2 x3 etc, now I already browsed through some literature and found that for panel data it's common to do a fixed effects regression. Also, my Y variable only has positive values so I was thinking in the direction of a Tobit model? I'm doing some research concerning the coverage of analysts in the financial business. My independent variable is the coverage of analysts on a certain firm, so per observation i have 1 analyst and 1 firm, together with different characteristics(market cap and betas etc) of the firm. All this data is monthly. As coverage cannot become negative (only 0) I was thinking of a Tobit model? Do you guys have any ideas what would be a good regression method? Or have some good sources (e books, written books, through university I have access to almost anything concerning my field of work) of information (cause I do have to learn these things for future research)? Thanks!

Read the article

What is the difference between Multiple R-squared and Adjusted R-squared in a single-variate least s

- by fmark

Could someone explain to the statistically naive what the difference between Multiple R-squared and Adjusted R-squared is? I am doing a single-variate regression analysis as follows: v.lm <- lm(epm ~ n_days, data=v) print(summary(v.lm)) Results: Call: lm(formula = epm ~ n_days, data = v) Residuals: Min 1Q Median 3Q Max -693.59 -325.79 53.34 302.46 964.95 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2550.39 92.15 27.677 <2e-16 *** n_days -13.12 5.39 -2.433 0.0216 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 410.1 on 28 degrees of freedom Multiple R-squared: 0.1746, Adjusted R-squared: 0.1451 F-statistic: 5.921 on 1 and 28 DF, p-value: 0.0216 Apologies for the newbiness of this question.

Read the article

MySQL Volleyball Standings

- by Torez

I have a database table full of game by game results and want to know if I can calculate the following: GP (games played) Wins Loses Points (2 points for each win, 1 point for each lose) Here is my table structure: CREATE TABLE `results` ( `id` int(10) unsigned NOT NULL auto_increment, `home_team_id` int(10) unsigned NOT NULL, `home_score` int(3) unsigned NOT NULL, `visit_team_id` int(10) unsigned NOT NULL, `visit_score` int(3) unsigned NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=7 ; And a few testing results: INSERT INTO `results` VALUES(1, 1, 21, 2, 25); INSERT INTO `results` VALUES(2, 3, 21, 4, 17); INSERT INTO `results` VALUES(3, 1, 25, 3, 9); INSERT INTO `results` VALUES(4, 2, 7, 4, 22); INSERT INTO `results` VALUES(5, 1, 19, 4, 20); INSERT INTO `results` VALUES(6, 2, 24, 3, 26); Here is what a final table would look like: +-------------------+----+------+-------+--------+ | Team Name | GP | Wins | Loses | Points | +-------------------+----+------+-------+--------+ | Spikers | 4 | 4 | 0 | 8 | | Leapers | 4 | 2 | 2 | 6 | | Ground Control | 4 | 1 | 3 | 5 | | Touch Guys | 4 | 0 | 4 | 4 | +-------------------+----+------+-------+--------+

Read the article

How do you compare the "similarity" between two dendrograms (in R) ?

- by Tal Galili

I have two dendrograms which I wish to compare to each other in order to find out how "similar" they are. But I don't know of any method to do so (let alone a code to implement it, say, in R). Any leads ? Thanks, Tal

Read the article

How can I loop through variables in SPSS? I want to avoid code duplication.

- by chucknelson

Is there a "native" SPSS way to loop through some variable names? All I want to do is take a list of variables (that I define) and run the same procedure for them: pseudo-code - not really a good example, but gets the point across... for i in varlist['a','b','c'] do FREQUENCIES VARIABLES=varlist[i] / ORDER=ANALYSIS. end I've noticed that people seem to just use R or Python SPSS plugins to achieve this basic array functionality, but I don't know how soon I can get those configured (if ever) on my installation of SPSS. SPSS has to have some native way to do this...right?

Read the article

Histogram matching - image processing - c/c++

- by Raj

Hello I have two histograms. int Hist1[10] = {1,4,3,5,2,5,4,6,3,2}; int Hist1[10] = {1,4,3,15,12,15,4,6,3,2}; Hist1's distribution is of type multi-modal; Hist2's distribution is of type uni-modal with single prominent peak. My questions are Is there any way that i could determine the type of distribution programmatically? How to quantify whether these two histograms are similar/dissimilar? Thanks

Read the article

Generation of an array of Random numbers with defined Min, Max, Mean and Stdev with given number of

- by Viet

I'd like to generate an array of Random numbers with defined Min, Max, Mean and Stdev with given number of elements and error level. Is there such a library in C, C++, PHP or Python to do so? Please kindly advise. Thanks!

Read the article

Classifying captured data in unknown format?

- by monch1962

I've got a large set of captured data (potentially hundreds of thousands of records), and I need to be able to break it down so I can both classify it and also produce "typical" data myself. Let me explain further... If I have the following strings of data: 132T339G1P112S 164T897F5A498S 144T989B9B223T 155T928X9Z554T ... you might start to infer the following: possibly all strings are 14 characters long the 4th, 8th, 10th and 14th characters may always be alphas, while the rest are numeric the first character may always be a '1' the 4th character may always be the letter 'T' the 14th character may be limited to only being 'S' or 'T' and so on... As you get more and more samples of real data, some of these "rules" might disappear; if you see a 15 character long string, then you have evidence that the 1st "rule" is incorrect. However, given a sufficiently large sample of strings that are exactly 14 characters long, you can start to assume that "all strings are 14 characters long" and assign a numeric figure to your degree of confidence (with an appropriate set of assumptions around the fact that you're seeing a suitably random set of all possible captured data). As you can probably tell, a human can do a lot of this classification by eye, but I'm not aware of libraries or algorithms that would allow a computer to do it. Given a set of captured data (significantly more complex than the above...), are there libraries that I can apply in my code to do this sort of classification for me, that will identify "rules" with a given degree of confidence? As a next step, I need to be able to take those rules, and use them to create my own data that conforms to these rules. I assume this is a significantly easier step than the classification, but I've never had to perform a task like this before so I'm really not sure how complex it is. At a guess, Python or Java (or possibly Perl or R) are possibly the "common" languages most likely to have these sorts of libraries, and maybe some of the bioinformatic libraries do this sort of thing. I really don't care which language I have to use; I need to solve the problem in whatever way I can. Any sort of pointer to information would be very useful. As you can probably tell, I'm struggling to describe this problem clearly, and there may be a set of appropriate keywords I can plug into Google that will point me towards the solution. Thanks in advance

Read the article

Using ARIMA to model and forecast stock prices using user-friendly stats program

- by Brian

Hi people, Can anyone please offer some insight into this for me? I'm coming from a functional magnetic resonance imaging research background where I analyzed a lot of time series data, and I'd like to analyze the time series of stock prices (or returns) by: 1) modeling a successful stock in a particular market sector and then cross-correlating the time series of this historically successful stock with that of other newer stocks to look for significant relationships; 2) model a stock's price time series and use forecasting (e.g., exponential smoothing) to predict future values of it. I'd like to use non-linear modeling methods (ARIMA and ARCH) to do this. Several questions: How often do ARIMA and ARCH modeling methods (given that the individual who implements them does so accurately) actually fit the stock time series data they target, and what is the optimal fit I can expect? Is the extent to which this model fits the data commensurate with the extent to which it predicts this stock time series' future values? Rather than randomly selecting stocks to compare or model, if profit is my goal, what is an efficient approach, if any, to selecting the stocks I'm going to analyze? Which stats program is the most user-friendly for this? Any thoughts on this would be great and would go a long way for me. Thanks, Brian

Read the article

Get percentiles of data-set with group by month

- by Cylindric

Hello, I have a SQL table with a whole load of records that look like this: | Date | Score | + -----------+-------+ | 01/01/2010 | 4 | | 02/01/2010 | 6 | | 03/01/2010 | 10 | ... | 16/03/2010 | 2 | I'm plotting this on a chart, so I get a nice line across the graph indicating score-over-time. Lovely. Now, what I need to do is include the average score on the chart, so we can see how that changes over time, so I can simply add this to the mix: SELECT YEAR(SCOREDATE) 'Year', MONTH(SCOREDATE) 'Month', MIN(SCORE) MinScore, AVG(SCORE) AverageScore, MAX(SCORE) MaxScore FROM SCORES GROUP BY YEAR(SCOREDATE), MONTH(SCOREDATE) ORDER BY YEAR(SCOREDATE), MONTH(SCOREDATE) That's no problem so far. The problem is, how can I easily calculate the percentiles at each time-period? I'm not sure that's the correct phrase. What I need in total is: A line on the chart for the score (easy) A line on the chart for the average (easy) A line on the chart showing the band that 95% of the scores occupy (stumped) It's the third one that I don't get. I need to calculate the 5% percentile figures, which I can do singly: SELECT MAX(SubQ.SCORE) FROM (SELECT TOP 45 PERCENT SCORE FROM SCORES WHERE YEAR(SCOREDATE) = 2010 AND MONTH(SCOREDATE) = 1 ORDER BY SCORE ASC) AS SubQ SELECT MIN(SubQ.SCORE) FROM (SELECT TOP 45 PERCENT SCORE FROM SCORES WHERE YEAR(SCOREDATE) = 2010 AND MONTH(SCOREDATE) = 1 ORDER BY SCORE DESC) AS SubQ But I can't work out how to get a table of all the months. | Date | Average | 45% | 55% | + -----------+---------+-----+-----+ | 01/01/2010 | 13 | 11 | 15 | | 02/01/2010 | 10 | 8 | 12 | | 03/01/2010 | 5 | 4 | 10 | ... | 16/03/2010 | 7 | 7 | 9 | At the moment I'm going to have to load this lot up into my app, and calculate the figures myself. Or run a larger number of individual queries and collate the results.

Read the article

How can I get the decreasing cumulative of frequency series using Perl?

- by neversaint

I have a data that looks like this: 3 2 1 5 What I want to get is the "decreasing" cumulative of this data yielding 11 8 6 5 0 What is the compact way of doing that in Perl?

Read the article

Partial Least Squares Implementation for C/C++?

- by Colin

Does anyone know of an open-source implementation of a partial least squares algorithm in C or C++?

Read the article

R: How to remove outliers from a smoother in ggplot2?

- by John

I have the following data set that I am trying to plot with ggplot2, it is a time series of three experiments A1, B1 and C1 and each experiment had three replicates. I am trying to add a stat which detects and removes outliers before returning a smoother (mean and variance?). I have written my own outlier function (not shown) but I expect there is already a function to do this, I just have not found it. I've looked at stat_sum_df("median_hilow", geom = "smooth") from some examples in the ggplot2 book, but I didn't understand the help doc from Hmisc to see if it removes outliers or not. Is there a function to remove outliers like this in ggplot, or where would I amend my code below to add my own function? library (ggplot2) data = data.frame (day = c(1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7), od = c( 0.1,1.0,0.5,0.7 ,0.13,0.33,0.54,0.76 ,0.1,0.35,0.54,0.73 ,1.3,1.5,1.75,1.7 ,1.3,1.3,1.0,1.6 ,1.7,1.6,1.75,1.7 ,2.1,2.3,2.5,2.7 ,2.5,2.6,2.6,2.8 ,2.3,2.5,2.8,3.8), series_id = c( "A1", "A1", "A1","A1", "A1", "A1", "A1","A1", "A1", "A1", "A1","A1", "B1", "B1","B1", "B1", "B1", "B1","B1", "B1", "B1", "B1","B1", "B1", "C1","C1", "C1", "C1", "C1","C1", "C1", "C1", "C1","C1", "C1", "C1"), replicate = c( "A1.1","A1.1","A1.1","A1.1", "A1.2","A1.2","A1.2","A1.2", "A1.3","A1.3","A1.3","A1.3", "B1.1","B1.1","B1.1","B1.1", "B1.2","B1.2","B1.2","B1.2", "B1.3","B1.3","B1.3","B1.3", "C1.1","C1.1","C1.1","C1.1", "C1.2","C1.2","C1.2","C1.2", "C1.3","C1.3","C1.3","C1.3")) > data day od series_id replicate 1 1 0.10 A1 A1.1 2 3 1.00 A1 A1.1 3 5 0.50 A1 A1.1 4 7 0.70 A1 A1.1 5 1 0.13 A1 A1.2 6 3 0.33 A1 A1.2 7 5 0.54 A1 A1.2 8 7 0.76 A1 A1.2 9 1 0.10 A1 A1.3 10 3 0.35 A1 A1.3 11 5 0.54 A1 A1.3 12 7 0.73 A1 A1.3 13 1 1.30 B1 B1.1 This is what I have so far and is working nicely, but outliers are not removed: r <- ggplot(data = data, aes(x = day, y = od)) r + geom_point(aes(group = replicate, color = series_id)) + # add points geom_line(aes(group = replicate, color = series_id)) + # add lines geom_smooth(aes(group = series_id)) # add smoother, average of each replicate

Read the article

postgresql weighted average?

- by milovanderlinden

say I have a postgresql table with the following values: id | value ---------- 1 | 4 2 | 8 3 | 100 4 | 5 5 | 7 If I use postgresql to calculate the average, it gives me an average of 24.8 because the high value of 100 has great impact on the calculation. While in fact I would like to find an average somewhere around 6 and eliminate the extreme(s). I am looking for a way to eliminate extremes and want to do this "statistically correct". The extreme's cannot be fixed. I cannot say; If a value is over X, it has to be eliminated. I have been bending my head on the postgresql aggregate functions but cannot put my finger on what is right for me to use. Any suggestions?

Read the article

best way to statistically detect anomalies in data

- by reinier

Hi, our webapp collects huge amount of data about user actions, network business, database load, etc etc etc All data is stored in warehouses and we have quite a lot of interesting views on this data. if something odd happens chances are, it shows up somewhere in the data. However, to manually detect if something out of the ordinary is going on, one has to continually look through this data, and look for oddities. My question: what is the best way to detect changes in dynamic data which can be seen as 'out of the ordinary'. Are bayesan filters (I've seen these mentioned when reading about spam detection) the way to go? Any pointers would be great! EDIT: To clarify the data for example shows a daily curve of database load. This curve typically looks similar to the curve from yesterday In time this curve might change slowly. It would be nice that if the curve from day to day changes say within some perimeters, a warning could go off. R

Read the article

Workflow for statistical analysis and report writing

- by ws

Does anyone have any wisdom on workflows for data analysis related to custom report writing? The use-case is basically this: Client commissions a report that uses data analysis, e.g. a population estimate and related maps for a water district. The analyst downloads some data, munges the data and saves the result (e.g. adding a column for population per unit, or subsetting the data based on district boundaries). The analyst analyzes the data created in (2), gets close to her goal, but sees that needs more data and so goes back to (1). Rinse repeat until the tables and graphics meet QA/QC and satisfy the client. Write report incorporating tables and graphics. Next year, the happy client comes back and wants an update. This should be as simple as updating the upstream data by a new download (e.g. get the building permits from the last year), and pressing a "RECALCULATE" button, unless specifications change. At the moment, I just start a directory and ad-hoc it the best I can. I would like a more systematic approach, so I am hoping someone has figured this out... I use a mix of spreadsheets, SQL, ARCGIS, R, and Unix tools. Thanks! PS: Below is a basic Makefile that checks for dependencies on various intermediate datasets (w/ ".RData" suffix) and scripts (".R" suffix). Make uses timestamps to check dependencies, so if you 'touch ss07por.csv', it will see that this file is newer than all the files / targets that depend on it, and execute the given scripts in order to update them accordingly. This is still a work in progress, including a step for putting into SQL database, and a step for a templating language like sweave. Note that Make relies on tabs in its syntax, so read the manual before cutting and pasting. Enjoy and give feedback! http://www.gnu.org/software/make/manual/html%5Fnode/index.html#Top R=/home/wsprague/R-2.9.2/bin/R persondata.RData : ImportData.R ../../DATA/ss07por.csv Functions.R $R --slave -f ImportData.R persondata.Munged.RData : MungeData.R persondata.RData Functions.R $R --slave -f MungeData.R report.txt: TabulateAndGraph.R persondata.Munged.RData Functions.R $R --slave -f TabulateAndGraph.R report.txt

Read the article

Where can I find simple beta cdf implementation.

- by Gacek

I need to use beta distribution and inverse beta distribution in my project. There is quite good but complicated implementation in GSL, but I don't want to use such a big library only to get one function. I would like to either, implement it on my own or link some simple library. Do you know any sources that could help me? I'm looking for any books/articles about numerical approximation of beta PDF, libraries where it could be implemented. Any other suggestions would be also appreciated. Any programming language, but C++/C# preffered.

Read the article

probability of subset sum after rolling dice 4 times

- by binger

If we roll 4 dices (fair), what is the probability of "sum of subset" being 5. e.g. 1432,1121, 2344, 2354 have a subset sum of 5. Can you illustrate how to calculate this.

Read the article

Smoothing Small Data Set With Second Order Quadratic Curve

- by Rev316

I'm doing some specific signal analysis, and I am in need of a method that would smooth out a given bell-shaped distribution curve. A running average approach isn't producing the results I desire. I want to keep the min/max, and general shape of my fitted curve intact, but resolve the inconsistencies in sampling. In short: if given a set of data that models a simple quadratic curve, what statistical smoothing method would you recommend? If possible, please reference an implementation, library, or framework. Thanks SO!

Search Results

Search found 1692 results on 68 pages for 'trending and statistics'.

Page 13/68 | < Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >

- by Viet

- by Bart van Heukelom

- by Tony Eichelberger

- by Dave Jarvis

- by Thomas Ahle

- by polygenelubricants

- by John

- by fmark

- by Torez

- by Tal Galili

- by chucknelson

- by Raj

- by Viet

- by monch1962

- by Brian

- by Cylindric

- by neversaint

- by Colin

- by John

- by milovanderlinden

- by reinier

- by ws

- by Gacek

- by binger

- by Rev316

< Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >