Search Results

Search found 1631 results on 66 pages for 'statistics'.

Page 12/66 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19  | Next Page >

  • Plot 2 graphs in same plot in R?

    - by Sandra Schlichting
    I would like to plot y1 and y2 in the same plot. x <- seq(-2, 2, 0.05) y1 <- pnorm(x) y2 <- pnorm(x,1,1) plot(x,y1,type="l",col="red") plot(x,y2,type="l",col="green") But when I do it like this, they are not plotted in the same plot together. In Matlab can one do hold on, but does anyone know how to do this in R?

    Read the article

  • Question with R. Element wise multiplication, addition, and division with 2 data.frames with varying

    - by Michael
    I have a various data.frames with columns of the same length where I am trying to multiple 2 rows together element-wise and then sum this up. For example, below are two vectors I would like to perform this operation with. > a.1[186,] q01_a q01_b q01_c q01_d q01_e q01_f q01_g q01_h q01_i q01_j q01_k q01_l q01_m 3 3 3 3 2 2 2 3 1 NA NA 2 2 and > u.1[186,] q04_avl_a q04_avl_b q04_avl_c q04_avl_d q04_avl_e q04_avl_f q04_avl_g q04_avl_h q04_avl_i q04_avl_j q04_avl_k q04_avl_l q04_avl_m 4 2 3 4 3 4 4 4 3 4 3 3 3` The issue is that various rows have varying numbers of NA's. What I would like to do is skip the multiplication with any missing values ( the 10th and 11th position from my above example), and then after the addition divide by the number of elements that were multiplied (11 from the above example). Most rows are complete and would just be multiplied by 13. Thank you!

    Read the article

  • simple Stata program

    - by Cyrus S
    I am trying to write a simple program to combine coefficient and standard error estimates from a set of regression fits. I run, say, 5 regressions, and store the coefficient(s) and standard error(s) of interest into vectors (Stata matrix objects, actually). Then, I need to do the following: Find the mean value of the coefficient estimates. Combine the standard error estimates according to the formula suggested for combining results from "multiple imputation". The formula is the square root of the formula for "T" on page 6 of the following document: http://bit.ly/b05WX3 I have written Stata code that does this once, but I want to write this as a function (or "program", in Stata speak) that takes as arguments the vector (or matrix, if possible, to combine multiple estimates at once) of regression coefficient estimates and the vector (or matrix) of corresponding standard error estimates, and then generates 1 and 2 above. Here is the code that I wrote: (breg is a 1x5 vector of the regression coefficient estimates, and sereg is a 1x5 vector of the associated standard error estimates) mat ones = (1,1,1,1,1) mat bregmean = (1/5)*(ones*breg’) scalar bregmean_s = bregmean[1,1] mat seregmean = (1/5)*(ones*sereg’) mat seregbtv = (1/4)*(breg - bregmean#ones)* (breg - bregmean#ones)’ mat varregmi = (1/5)*(sereg*sereg’) + (1+(1/5))* seregbtv scalar varregmi_s = varregmi[1,1] scalar seregmi = sqrt(varregmi_s) disp bregmean_s disp seregmi This gives the right answer for a single instance. Any pointers would be great! UPDATE: I completed the code for combining estimates in a kXm matrix of coefficients/parameters (k is the number of parameters, m the number of imputations). Code can be found here: http://bit.ly/cXJRw1 Thanks to Tristan and Gabi for the pointers.

    Read the article

  • Fitting Gaussian KDE in numpy/scipy in Python

    - by user248237
    I am fitting a Gaussian kernel density estimator to a variable that is the difference of two vectors, called "diff", as follows: gaussian_kde_covfact(diff, smoothing_param) -- where gaussian_kde_covfact is defined as: class gaussian_kde_covfact(stats.gaussian_kde): def __init__(self, dataset, covfact = 'scotts'): self.covfact = covfact scipy.stats.gaussian_kde.__init__(self, dataset) def _compute_covariance_(self): '''not used''' self.inv_cov = np.linalg.inv(self.covariance) self._norm_factor = sqrt(np.linalg.det(2*np.pi*self.covariance)) * self.n def covariance_factor(self): if self.covfact in ['sc', 'scotts']: return self.scotts_factor() if self.covfact in ['si', 'silverman']: return self.silverman_factor() elif self.covfact: return float(self.covfact) else: raise ValueError, \ 'covariance factor has to be scotts, silverman or a number' def reset_covfact(self, covfact): self.covfact = covfact self.covariance_factor() self._compute_covariance() This works, but there is an edge case where the diff is a vector of all 0s. In that case, I get the error: File "/srv/pkg/python/python-packages/python26/scipy/scipy-0.7.1/lib/python2.6/site-packages/scipy/stats/kde.py", line 334, in _compute_covariance self.inv_cov = linalg.inv(self.covariance) File "/srv/pkg/python/python-packages/python26/scipy/scipy-0.7.1/lib/python2.6/site-packages/scipy/linalg/basic.py", line 382, in inv if info>0: raise LinAlgError, "singular matrix" numpy.linalg.linalg.LinAlgError: singular matrix What's a way to get around this? In this case, I'd like it to return a density that's essentially peaked completely at a difference of 0, with no mass everywhere else. thanks.

    Read the article

  • Where can I find free and open data?

    - by kitsune
    Sooner or later, coders will feel the need to have access to "open data" in one of their projects, from knowing a city's zip to a more obscure information such as the axial tilt of Pluto. I know data.un.org which offers access to the UN's extensive array of databases that deal with human development and other socio-economic issues. The other usual suspects are NASA and the USGS for planetary data. There's an article at readwriteweb with more links. infochimps.org seems to stand out. Personally, I need to find historic commodity prices, stock values and other financial data. All these data sets seem to cost money however. Clarification To clarify, I'm interested in all kinds of open data, because sooner or later, I know I will be in a situation where I could need it. I will try to edit this answer and include the suggestions in a structured manners. A link for financial data was hidden in that readwriteweb article, doh! It's called opentick.com. Looks good so far! Update I stumbled over semantic data in another question of mine on here. There is opencyc ('the world's largest and most complete general knowledge base and commonsense reasoning engine'). A project called UMBEL provides a light-weight, distilled version of opencyc. Umbel has semantic data in rdf/owl/skos n3 syntax. The Worldbank also released a very nice API. It offers data from the last 50 years for about 200 countries

    Read the article

  • How to know if two stocks move togheter?

    - by Damiano
    Hello, I have two stocks with their prices, example: STOCK1: 10.56 11.23 12.32 8.90 STOCK2: 1.26 5.80 3.26 10.3 I only found Pearson correlation, but, is there another method to know if two stocks move togheter? (esample: co-integration??) Thank you so much!

    Read the article

  • How to get statistical distributions out of C++ Code?

    - by Bader
    I want some help in programming a random generator for different types of distribution using C++ language. for the following: Geometric distribution Hypergeometric distribution Weibull distribution Rayleigh distribution Erlang distribution Gamma distribution Poisson distribution Thanks.

    Read the article

  • Latent Dirichlet Allocation, pitfalls, tips and programs

    - by Gregg Lind
    I'm experimenting with Latent Dirichlet Allocation for topic disambiguation and assignment, and I'm looking for advice. Which program is the "best", where best is some combination of easiest to use, best prior estimation, fast How do I incorporate my intuitions about topicality. Let's say I think I know that some items in the corpus are really in the same category, like all articles by the same author. Can I add that into the analysis? Any unexpected pitfalls or tips I should know before embarking? I'd prefer is there are R or Python front ends for whatever program, but I expect (and accept) that I'll be dealing with C.

    Read the article

  • science.js’s loess() output is identical to input

    - by user3710111
    Rendered project available here. The line is supposed to be a trend line (as rendered with LOESS), but it merely follows each data point instead. I am no stats wonk, so maybe it makes sense that a LOESS function’s output would match the input as seen in the above example, but it strikes me as being wrong. Here is the relevant bit of code: var loess = science.stats.loess().bandwidth(.2); var xVal = data.map(function(d) { return d.date; }); var yVal = data.map(function(d) { return d.A; }); var loessData = loess([xVal], [yVal])[0]; console.log(yVal); console.log(loessData);

    Read the article

  • Student's t distribution in JavaScript

    - by Sai Emrys
    Google Spreadsheets currently does not support the standard function TDIST - i.e. the Student's t-distribution. This function is critical for calculating p-values. It seems that this is related to the fact that no integral-using functions (AFAICT) are implemented either. However, Google Docs allows people to add and publish their own scripts, in JavaScript. So ideally we should have something like: function tdist(t_value, degrees_of_freedom, two_tailed [defaults true]) {...} Anyone know of either an extant implementation of this (my google-fu has not turned up one, but may be weaker than yours) or a good idea for how to do it? I'd like to publish this together with some other useful functions that are currently calculable but a bit of a pain (like Student's t-test itself). Thanks!

    Read the article

  • Is there a Pair-Wise PostHoc Comparisons for the Chi-Square Test in R?

    - by Tal Galili
    Hi all, I am wondering if there exists in R a package/function to perform the: "Post Hoc Pair-Wise Comparisons for the Chi-Square Test of Homogeneity of Proportions" (or an equivalent of it) Which is described here: http://epm.sagepub.com/cgi/content/abstract/53/4/951 My situation is of just making a chi test, on a 2 by X matrix. I found a difference, but I want to know which of the columns is "responsible" for the difference. Thanks, Tal

    Read the article

  • Non-linear regression models in PostgreSQL using R

    - by Dave Jarvis
    Background I have climate data (temperature, precipitation, snow depth) for all of Canada between 1900 and 2009. I have written a basic website and the simplest page allows users to choose category and city. They then get back a very simple report (without the parameters and calculations section): The primary purpose of the web application is to provide a simple user interface so that the general public can explore the data in meaningful ways. (A list of numbers is not meaningful to the general public, nor is a website that provides too many inputs.) The secondary purpose of the application is to provide climatologists and other scientists with deeper ways to view the data. (Using too many inputs, of course.) Tool Set The database is PostgreSQL with R (mostly) installed. The reports are written using iReport and generated using JasperReports. Poor Model Choice Currently, a linear regression model is applied against annual averages of daily data. The linear regression model is calculated within a PostgreSQL function as follows: SELECT regr_slope( amount, year_taken ), regr_intercept( amount, year_taken ), corr( amount, year_taken ) FROM temp_regression INTO STRICT slope, intercept, correlation; The results are returned to JasperReports using: SELECT year_taken, amount, year_taken * slope + intercept, slope, intercept, correlation, total_measurements INTO result; JasperReports calls into PostgreSQL using the following parameterized analysis function: SELECT year_taken, amount, measurements, regression_line, slope, intercept, correlation, total_measurements, execute_time FROM climate.analysis( $P{CityId}, $P{Elevation1}, $P{Elevation2}, $P{Radius}, $P{CategoryId}, $P{Year1}, $P{Year2} ) ORDER BY year_taken This is not an optimal solution because it gives the false impression that the climate is changing at a slow, but steady rate. Questions Using functions that take two parameters (e.g., year [X] and amount [Y]), such as PostgreSQL's regr_slope: What is a better regression model to apply? What CPAN-R packages provide such models? (Installable, ideally, using apt-get.) How can the R functions be called within a PostgreSQL function? If no such functions exist: What parameters should I try to obtain for functions that will produce the desired fit? How would you recommend showing the best fit curve? Keep in mind that this is a web app for use by the general public. If the only way to analyse the data is from an R shell, then the purpose has been defeated. (I know this is not the case for most R functions I have looked at so far.) Thank you!

    Read the article

  • probability and relative frequency

    - by Alexandru
    If I use relative frequency to estimate the probability of an event, how good is my estimate based on the number of experiments? Is standard deviation a good measure? A paper/link/online book would be perfect. http://en.wikipedia.org/wiki/Frequentist

    Read the article

  • redirect user, then log his visit using php and mysql

    - by Bart van Heukelom
    I have a PHP redirect page to track clicks on links. Basically it does: - get url from $_GET - connect to database - create row for url, or update with 1 hit if it exists - redirect browser to url using Location: header I was wondering if it's possible to send the redirect to the client first, so it can get on with it's job, and then log the click. Would simply switching things around work?

    Read the article

  • How is the iPad going to be classified - as a mobile platform or a desktop platform?

    - by Tony Eichelberger
    I sometimes use the following site to look at browser and OS trends http://gs.statcounter.com/. It got me thinking about how the iPad is going to be classified, as a mobile platform or a desktop platform, or is it going to spark a new category. Since it runs iPhone OS, it could be considered a mobile device, but I have a hard time with that because of the screen size. What should iPad be classified as: Mobile, Desktop, or Other (Try to come up with a good name for Other)?

    Read the article

  • Multiple outliers for two variable linear regression

    - by Dave Jarvis
    Problem Building on my previous question, the "extreme" outliers in the following graph are somewhat obvious: Question Given: T - Set of all temperatures Y - Set of all years ST - Sum of temperatures. SY - Sum of years. N - Number of elements T(n) - Temperature of the nth element in the temperature set How would you implement an efficient MySQL stored procedure or user-defined function (UDF) to determine if T(n) is an outlier? (If such an implementation already exists, that would be good to know as well.) Related Sites I am slowly working through these sites to get a better understanding of the problem: Multiple Outliers Detection Procedures in Linear Regression M-estimator Measure of Surprise for Outlier Detection Ordinary Least Squares Linear Regression Many thanks!

    Read the article

  • Beginner SQL question: arithmetic with multiple COUNT(*) results

    - by polygenelubricants
    Continuing with the spirit of using the Stack Exchange Data Explorer to learn SQL, (see: Can we become our own “Northwind” for teaching SQL / databases?), I've decided to try to write a query to answer a simple question (on meta): What % of stackoverflow users have over 10,000 rep?. Here's what I've done: Query#1 SELECT COUNT(*) FROM Users WHERE Users.Reputation >= 10000 Result: 556 Query#2 SELECT COUNT(*) FROM USERS Result: 227691 Now, how do I put them together into one query? What is this query idiom called? What do I need to write so I can get, say, a one-row three-column result like this: 556 227691 0,00244190592

    Read the article

  • MySQL Volleyball Standings

    - by Torez
    I have a database table full of game by game results and want to know if I can calculate the following: GP (games played) Wins Loses Points (2 points for each win, 1 point for each lose) Here is my table structure: CREATE TABLE `results` ( `id` int(10) unsigned NOT NULL auto_increment, `home_team_id` int(10) unsigned NOT NULL, `home_score` int(3) unsigned NOT NULL, `visit_team_id` int(10) unsigned NOT NULL, `visit_score` int(3) unsigned NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=7 ; And a few testing results: INSERT INTO `results` VALUES(1, 1, 21, 2, 25); INSERT INTO `results` VALUES(2, 3, 21, 4, 17); INSERT INTO `results` VALUES(3, 1, 25, 3, 9); INSERT INTO `results` VALUES(4, 2, 7, 4, 22); INSERT INTO `results` VALUES(5, 1, 19, 4, 20); INSERT INTO `results` VALUES(6, 2, 24, 3, 26); Here is what a final table would look like: +-------------------+----+------+-------+--------+ | Team Name | GP | Wins | Loses | Points | +-------------------+----+------+-------+--------+ | Spikers | 4 | 4 | 0 | 8 | | Leapers | 4 | 2 | 2 | 6 | | Ground Control | 4 | 1 | 3 | 5 | | Touch Guys | 4 | 0 | 4 | 4 | +-------------------+----+------+-------+--------+

    Read the article

  • Probability distribution for sms answer delays

    - by Thomas Ahle
    I'm writing an app using sms as communication. I have chosen to subscribe to an sms-gateway, which provides me with an API for doing so. The API has functions for sending as well as pulling new messages. It does however not have any kind of push functionality. In order to do my queries most efficient, I'm seeking data on how long time people wait before they answer a text message - as a probability function. Extra info: The application is interactive (as can be), so I suppose the times will be pretty similar to real life human-human communication. I don't believe differences in personal style will play a big impact on the right times and frequencies to query, so average data should be fine.

    Read the article

  • Regressing panel data in SAS.

    - by John
    Hey Guys, thanks to your help I succesfully managed all my databases! I am now looking at a panel data set on which I have to regress. Since I only started my Phd this semester together with the econometrics courses I am still new to many statistic applications and regression methods. I want to do a simple regression as in Y = x1 x2 x3 etc, now I already browsed through some literature and found that for panel data it's common to do a fixed effects regression. Also, my Y variable only has positive values so I was thinking in the direction of a Tobit model? I'm doing some research concerning the coverage of analysts in the financial business. My independent variable is the coverage of analysts on a certain firm, so per observation i have 1 analyst and 1 firm, together with different characteristics(market cap and betas etc) of the firm. All this data is monthly. As coverage cannot become negative (only 0) I was thinking of a Tobit model? Do you guys have any ideas what would be a good regression method? Or have some good sources (e books, written books, through university I have access to almost anything concerning my field of work) of information (cause I do have to learn these things for future research)? Thanks!

    Read the article

  • How can I loop through variables in SPSS? I want to avoid code duplication.

    - by chucknelson
    Is there a "native" SPSS way to loop through some variable names? All I want to do is take a list of variables (that I define) and run the same procedure for them: pseudo-code - not really a good example, but gets the point across... for i in varlist['a','b','c'] do FREQUENCIES VARIABLES=varlist[i] / ORDER=ANALYSIS. end I've noticed that people seem to just use R or Python SPSS plugins to achieve this basic array functionality, but I don't know how soon I can get those configured (if ever) on my installation of SPSS. SPSS has to have some native way to do this...right?

    Read the article

  • What is the difference between Multiple R-squared and Adjusted R-squared in a single-variate least s

    - by fmark
    Could someone explain to the statistically naive what the difference between Multiple R-squared and Adjusted R-squared is? I am doing a single-variate regression analysis as follows: v.lm <- lm(epm ~ n_days, data=v) print(summary(v.lm)) Results: Call: lm(formula = epm ~ n_days, data = v) Residuals: Min 1Q Median 3Q Max -693.59 -325.79 53.34 302.46 964.95 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2550.39 92.15 27.677 <2e-16 *** n_days -13.12 5.39 -2.433 0.0216 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 410.1 on 28 degrees of freedom Multiple R-squared: 0.1746, Adjusted R-squared: 0.1451 F-statistic: 5.921 on 1 and 28 DF, p-value: 0.0216 Apologies for the newbiness of this question.

    Read the article

< Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19  | Next Page >