Search Results

Search found 89214 results on 3569 pages for 'code statistics'.

Page 74/3569 | < Previous Page | 70 71 72 73 74 75 76 77 78 79 80 81 | Next Page >

Using ARIMA to model and forecast stock prices using user-friendly stats program

- by Brian

Hi people, Can anyone please offer some insight into this for me? I'm coming from a functional magnetic resonance imaging research background where I analyzed a lot of time series data, and I'd like to analyze the time series of stock prices (or returns) by: 1) modeling a successful stock in a particular market sector and then cross-correlating the time series of this historically successful stock with that of other newer stocks to look for significant relationships; 2) model a stock's price time series and use forecasting (e.g., exponential smoothing) to predict future values of it. I'd like to use non-linear modeling methods (ARIMA and ARCH) to do this. Several questions: How often do ARIMA and ARCH modeling methods (given that the individual who implements them does so accurately) actually fit the stock time series data they target, and what is the optimal fit I can expect? Is the extent to which this model fits the data commensurate with the extent to which it predicts this stock time series' future values? Rather than randomly selecting stocks to compare or model, if profit is my goal, what is an efficient approach, if any, to selecting the stocks I'm going to analyze? Which stats program is the most user-friendly for this? Any thoughts on this would be great and would go a long way for me. Thanks, Brian

Read the article
Partial Least Squares Implementation for C/C++?

- by Colin

Does anyone know of an open-source implementation of a partial least squares algorithm in C or C++?

Read the article
How can I get the decreasing cumulative of frequency series using Perl?

- by neversaint

I have a data that looks like this: 3 2 1 5 What I want to get is the "decreasing" cumulative of this data yielding 11 8 6 5 0 What is the compact way of doing that in Perl?

Read the article
postgresql weighted average?

- by milovanderlinden

say I have a postgresql table with the following values: id | value ---------- 1 | 4 2 | 8 3 | 100 4 | 5 5 | 7 If I use postgresql to calculate the average, it gives me an average of 24.8 because the high value of 100 has great impact on the calculation. While in fact I would like to find an average somewhere around 6 and eliminate the extreme(s). I am looking for a way to eliminate extremes and want to do this "statistically correct". The extreme's cannot be fixed. I cannot say; If a value is over X, it has to be eliminated. I have been bending my head on the postgresql aggregate functions but cannot put my finger on what is right for me to use. Any suggestions?

Read the article
best way to statistically detect anomalies in data

- by reinier

Hi, our webapp collects huge amount of data about user actions, network business, database load, etc etc etc All data is stored in warehouses and we have quite a lot of interesting views on this data. if something odd happens chances are, it shows up somewhere in the data. However, to manually detect if something out of the ordinary is going on, one has to continually look through this data, and look for oddities. My question: what is the best way to detect changes in dynamic data which can be seen as 'out of the ordinary'. Are bayesan filters (I've seen these mentioned when reading about spam detection) the way to go? Any pointers would be great! EDIT: To clarify the data for example shows a daily curve of database load. This curve typically looks similar to the curve from yesterday In time this curve might change slowly. It would be nice that if the curve from day to day changes say within some perimeters, a warning could go off. R

Read the article
How can I graph the Lines of Code history for git repo?

- by dbr

Basically I want to get the number of lines-of-code in the repository after each commit. The only (really crappy) ways I have found is to use git filter-branch to run "wc -l *", and a script that run git reset --hard on each commit, then ran wc -l To make it a bit clearer, when the tool is run, it would output the lines of code of the very first commit, then the second and so on.. This is what I want the tool to output (as an example): me@something:~/$ gitsloc --branch master 10 48 153 450 1734 1542 I've played around with the ruby 'git' library, but the closest I found was using the .lines() method on a diff, which seems like it should give the added lines (but does not.. it returns 0 when you delete lines for example) require 'rubygems' require 'git' total = 0 g = Git.open(working_dir = '/Users/dbr/Desktop/code_projects/tvdb_api') last = nil g.log.each do |cur| diff = g.diff(last, cur) total = total + diff.lines puts total last = cur end

Read the article
Workflow for statistical analysis and report writing

- by ws

Does anyone have any wisdom on workflows for data analysis related to custom report writing? The use-case is basically this: Client commissions a report that uses data analysis, e.g. a population estimate and related maps for a water district. The analyst downloads some data, munges the data and saves the result (e.g. adding a column for population per unit, or subsetting the data based on district boundaries). The analyst analyzes the data created in (2), gets close to her goal, but sees that needs more data and so goes back to (1). Rinse repeat until the tables and graphics meet QA/QC and satisfy the client. Write report incorporating tables and graphics. Next year, the happy client comes back and wants an update. This should be as simple as updating the upstream data by a new download (e.g. get the building permits from the last year), and pressing a "RECALCULATE" button, unless specifications change. At the moment, I just start a directory and ad-hoc it the best I can. I would like a more systematic approach, so I am hoping someone has figured this out... I use a mix of spreadsheets, SQL, ARCGIS, R, and Unix tools. Thanks! PS: Below is a basic Makefile that checks for dependencies on various intermediate datasets (w/ ".RData" suffix) and scripts (".R" suffix). Make uses timestamps to check dependencies, so if you 'touch ss07por.csv', it will see that this file is newer than all the files / targets that depend on it, and execute the given scripts in order to update them accordingly. This is still a work in progress, including a step for putting into SQL database, and a step for a templating language like sweave. Note that Make relies on tabs in its syntax, so read the manual before cutting and pasting. Enjoy and give feedback! http://www.gnu.org/software/make/manual/html%5Fnode/index.html#Top R=/home/wsprague/R-2.9.2/bin/R persondata.RData : ImportData.R ../../DATA/ss07por.csv Functions.R $R --slave -f ImportData.R persondata.Munged.RData : MungeData.R persondata.RData Functions.R $R --slave -f MungeData.R report.txt: TabulateAndGraph.R persondata.Munged.RData Functions.R $R --slave -f TabulateAndGraph.R report.txt

Read the article
probability of subset sum after rolling dice 4 times

- by binger

If we roll 4 dices (fair), what is the probability of "sum of subset" being 5. e.g. 1432,1121, 2344, 2354 have a subset sum of 5. Can you illustrate how to calculate this.

Read the article
Smoothing Small Data Set With Second Order Quadratic Curve

- by Rev316

I'm doing some specific signal analysis, and I am in need of a method that would smooth out a given bell-shaped distribution curve. A running average approach isn't producing the results I desire. I want to keep the min/max, and general shape of my fitted curve intact, but resolve the inconsistencies in sampling. In short: if given a set of data that models a simple quadratic curve, what statistical smoothing method would you recommend? If possible, please reference an implementation, library, or framework. Thanks SO!

Read the article
Where can I find simple beta cdf implementation.

- by Gacek

I need to use beta distribution and inverse beta distribution in my project. There is quite good but complicated implementation in GSL, but I don't want to use such a big library only to get one function. I would like to either, implement it on my own or link some simple library. Do you know any sources that could help me? I'm looking for any books/articles about numerical approximation of beta PDF, libraries where it could be implemented. Any other suggestions would be also appreciated. Any programming language, but C++/C# preffered.

Read the article
ASP.NET MVC reminds me of old Classic ASP spaghetti code...

- by EdenMachine

I just went through some MVC tutorials after checking this site out for a while. Is it just me, or does MVC View pages brinig back HORRIBLE flashbacks of Classic ASP spaghetti code with all the jumping in and out of HTML and ASP.NET with yellow delimiters everywhere making it impossible to read? What ever happened to the importance of code/design separation?? I was really sold on the new technology until the tutorials hit the View page development section. Or am I missing something? (And don't say you can use the template to help because it's jsut moving the spaghetti to another location - sweeps it under the rug - it doesn't fix the problem)

Read the article
Using recode in R

- by celenius

I'm trying to use 'recode' in R (from the 'cars' package) and it is not working. I read in data from a .csv file into a data frame called 'results'. Then, I replace the values in the column 'Built_year', according to the following logic. recode(results$Built_year, "2 ='1950s';3='1960s';4='1970s';5='1980s';6='1990s';7='2000 or later'") When I check results$Built_year after doing this step, it appears to have worked. However, it does not store this value, and returns to its previous value. I don't understand why. Thanks. (at the moment something is going wrong and I can't see any of the icons for formatting)

Read the article
How to convert postal code to city name, is there an API available?

- by LingAi

How to convert post code to city name, is there an API available?

Read the article
Generating lognormally distributed random number from mean, coeff of variation

- by Richie Cotton

Most functions for generating lognormally distributed random numbers take the mean and standard deviation of the associated normal distribution as parameters. My problem is that I only know the mean and the coefficient of variation of the lognormal distribution. It is reasonably straight forward to derive the parameters I need for the standard functions from what I have: If mu and sigma are the mean and standard deviation of the associated normal distribution, we know that coeffOfVar^2 = variance / mean^2 = (exp(sigma^2) - 1) * exp(2*mu + sigma^2) / exp(mu + sigma^2/2)^2 = exp(sigma^2) - 1 We can rearrange this to sigma = sqrt(log(coeffOfVar^2 + 1)) We also know that mean = exp(mu + sigma^2/2) This rearranges to mu = log(mean) - sigma^2/2 Here's my R implementation rlnorm0 <- function(mean, coeffOfVar, n = 1e6) { sigma <- sqrt(log(coeffOfVar^2 + 1)) mu <- log(mean) - sigma^2 / 2 rlnorm(n, mu, sigma) } It works okay for small coefficients of variation r1 <- rlnorm0(2, 0.5) mean(r1) # 2.000095 sd(r1) / mean(r1) # 0.4998437 But not for larger values r2 <- rlnorm0(2, 50) mean(r2) # 2.048509 sd(r2) / mean(r2) # 68.55871 To check that it wasn't an R-specific issue, I reimplemented it in MATLAB. (Uses stats toolbox.) function y = lognrnd0(mean, coeffOfVar, sizeOut) if nargin < 3 || isempty(sizeOut) sizeOut = [1e6 1]; end sigma = sqrt(log(coeffOfVar.^2 + 1)); mu = log(mean) - sigma.^2 ./ 2; y = lognrnd(mu, sigma, sizeOut); end r1 = lognrnd0(2, 0.5); mean(r1) % 2.0013 std(r1) ./ mean(r1) % 0.5008 r2 = lognrnd0(2, 50); mean(r2) % 1.9611 std(r2) ./ mean(r2) % 22.61 Same problem. The question is, why is this happening? Is it just that the standard deviation is not robust when the variation is that wide? Or have a screwed up somewhere?

Read the article
Datasets for Running Statistical Analysis on

- by Tal Galili

What datasets exist out on the internet that I can run statistical analysis on?

Read the article
Mathematica, PDF Curves and Shading

- by Venerable Garbage Collector

I need to plot a normal distribution and then shade some specific region of it. Right now I'm doing this by creating a plot of the distribution and overlaying it with a RegionPlot. This is pretty convoluted and I'm certain there must be a more elegant way of doing it. I Googled, looked at the docs, found nothing. Help me SO! I guess Mathematica counts as programming? :D

Read the article
need more complex index !

- by silversky

I'm sorry if it's not an appropriate question for this site, and if it's necesary I'll close this question. But maybe someone could give me an ideea: I'm trying to find a more complex index to make an hierarchy. For example: 5 votes from 6 = 83% AND 500 votes from 600 = 83%; 10 votes from 600 = 1.66% If I make a hierarchy with the %, first two will be on the same place, but I think that 83% from 600 it's more valuable than the first one. I could compare 5, 10, 500, but again it's not fair because the third case (10 votes) will be in front of the first case (5 votes), wich it's not fair beacuse the third case has only 1.66% Maybe someone could give me an ideea how to give more weight for the second case but in the same time let the let the new entries have a fair chance.

Read the article
Exporting Stata results

- by Max M.

I'm sure this is an issue anyone who uses Stata for publications or reports has run into: how do you conveniently export your output to something that can be parsed by a scripting language or Excel? There are a few ADO files that to this for specific commands (try findit tabout or findit outreg2). But what about exporting the output of the table command? Or the results of an anova? I'd love to hear about how Stata users address this problem for either specific commands or in general.

Read the article
Is it possible to do A/B testing by page rather than by individual?

- by mojones

Lets say I have a simple ecommerce site that sells 100 different t-shirt designs. I want to do some a/b testing to optimise my sales. Let's say I want to test two different "buy" buttons. Normally, I would use AB testing to randomly assign each visitor to see button A or button B (and try to ensure that that the user experience is consistent by storing that assignment in session, cookies etc). Would it be possible to take a different approach and instead, randomly assign each of my 100 designs to use button A or B, and measure the conversion rate as (number of sales of design n) / (pageviews of design n) This approach would seem to have some advantages; I would not have to worry about keeping the user experience consistent - a given page (e.g. www.example.com/viewdesign?id=6) would always return the same html. If I were to test different prices, it would be far less distressing to the user to see different prices for different designs than different prices for the same design on different computers. I also wonder whether it might be better for SEO - my suspicion is that Google would "prefer" that it always sees the same html when crawling a page. Obviously this approach would only be suitable for a limited number of sites; I was just wondering if anyone has tried it?

Read the article
What is the ratio of Java programmers to C#.net programmers?

- by Vaccano

How many Java Programmers are there to every C# programmer? I have a coworker that says it was 3:1 (3 Java to 1 C#) but it is now more like 2:1 (2 java to 1 C#) Is this valid? Is there somewhere I could go for this info? Edit: This question needs to be a bit more limited in scope. I am referring to US programmers and those who would consider their career to be more focused in one side than the other. (If you are evenly balanced then you would cancel out.)

Read the article
mysql/algorithm: Weighting an average to accentuate differences from the mean

- by Sai Emrys

This is for a new feature on http://cssfingerprint.com (see /about for general info). The feature looks up the sites you've visited in a database of site demographics, and tries to guess what your demographic stats are based on that. All my demgraphics are in 0..1 probability format, not ratios or absolute numbers or the like. Essentially, you have a large number of data points that each tend you towards their own demographics. However, just taking the average is poor, because it means that by adding in a lot of generic data, the number goes down. For example, suppose you've visited sites S0..S50. All except S0 are 48% female; S0 is 100% male. If I'm guessing your gender, I want to have a value close to 100%, not just the 49% that a straight average would give. Also, consider that most demographics (i.e. everything other than gender) does not have the average at 50%. For example, the average probability of having kids 0-17 is ~37%. The more a given site's demographics are different from this average (e.g. maybe it's a site for parents, or for child-free people), the more it should count in my guess of your status. What's the best way to calculate this? For extra credit: what's the best way to calculate this, that is also cheap & easy to do in mysql?

Read the article
What is the best Java numerical method package?

- by Bob Cross

I am looking for a Java-based numerical method package that provides functionality including: Solving systems of equations using different numerical analysis algorithms. Matrix methods (e.g., inversion). Spline approximations. Probability distributions and statistical methods. In this case, "best" is defined as a package with a mature and usable API, solid performance and numerical accuracy. Edit: derick van brought up a good point in that cost is a factor. I am heavily biased in favor of free packages but others may have a different emphasis.

Read the article
How to notice unusual news activity

- by ??iu

Suppose you were able keep track of the news mentions of different entities, like say "Steve Jobs" and "Steve Ballmer". What are ways that could you tell whether the amount of mentions per entity per a given time period was unusual relative to their normal degree of frequency of appearance? I imagine that for a more popular person like Steve Jobs an increase of like 50% might be unusual (an increase of 1000 to 1500), while for a relatively unknown CEO an increase of 1000% for a given day could be possible (an increase of 2 to 200). If you didn't have a way of scaling that your unusualness index could be dominated by unheard-ofs getting their 15 minutes of fame.

Read the article
Referal links - how does it work ?

- by oneat

Could you explain me? Because in stats I can find Refelar links and I am curious. How are they made? Is it placed somewhere in HTTP request?

Read the article
R Question. Numeric variable vs. Non-numeric and "names" function

- by Michael

> scores=cbind(UNCA.score, A.score, B.score, U.m.A, U.m.B) > names(scores)=c('UNCA.scores', 'A.scores', 'B.scores','UNCA.minus.A', 'UNCA.minus.B') > names(scores) [1] "UNCA.scores" "A.scores" "B.scores" "UNCA.minus.A" "UNCA.minus.B" > summary(UNCA.scores) X6.69230769230769 Min. : 4.154 1st Qu.: 7.333 Median : 8.308 Mean : 8.451 3rd Qu.: 9.538 Max. :12.000 > is.numeric(UNCA.scores) [1] FALSE > is.numeric(scores[,1]) [1] TRUE My question is, what is the difference between UNCA.scores and scores[,1]? UNCA.scores is the first column in the data.frame 'scores', but they are not the same thing, since one is numeric and the other isn't. If UNCA.scores is just a label here how can I make it be equivalent to 'scores[,1]? Thanks!

Read the article

< Previous Page | 70 71 72 73 74 75 76 77 78 79 80 81 | Next Page >