code statistics - Page 73

Question with R. Element wise multiplication, addition, and division with 2 data.frames with varying

- by Michael

I have a various data.frames with columns of the same length where I am trying to multiple 2 rows together element-wise and then sum this up. For example, below are two vectors I would like to perform this operation with. > a.1[186,] q01_a q01_b q01_c q01_d q01_e q01_f q01_g q01_h q01_i q01_j q01_k q01_l q01_m 3 3 3 3 2 2 2 3 1 NA NA 2 2 and > u.1[186,] q04_avl_a q04_avl_b q04_avl_c q04_avl_d q04_avl_e q04_avl_f q04_avl_g q04_avl_h q04_avl_i q04_avl_j q04_avl_k q04_avl_l q04_avl_m 4 2 3 4 3 4 4 4 3 4 3 3 3` The issue is that various rows have varying numbers of NA's. What I would like to do is skip the multiplication with any missing values ( the 10th and 11th position from my above example), and then after the addition divide by the number of elements that were multiplied (11 from the above example). Most rows are complete and would just be multiplied by 13. Thank you!

Read the article

VS2010 / Code Analysis: Turn off a rule for a project without custom ruleset....

- by TomTom

...any change? The scenario is this: For our company we develop a standard how code should look. This will be the MS full rule set as it looks now. For some specific projects we may want to turn off specific rules. Simply because for a specific project this is a "known exception". Example? CA1026 - while perfectly ok in most cases, there are 1-2 specific libraries we dont want to change those. We also want to avoid having a custom rule set. OTOH putting in a suppress attribute on every occurance gets pretty convoluted pretty fast. Any way to turn off a code analysis warning for a complete assembly without a custom rule set? We rather have that in a specific file (GlobalSuppressions.cs) than in a rule set for maintenance reasons, and to be more explicit ;)

Read the article

Where can I find free and open data?

- by kitsune

Sooner or later, coders will feel the need to have access to "open data" in one of their projects, from knowing a city's zip to a more obscure information such as the axial tilt of Pluto. I know data.un.org which offers access to the UN's extensive array of databases that deal with human development and other socio-economic issues. The other usual suspects are NASA and the USGS for planetary data. There's an article at readwriteweb with more links. infochimps.org seems to stand out. Personally, I need to find historic commodity prices, stock values and other financial data. All these data sets seem to cost money however. Clarification To clarify, I'm interested in all kinds of open data, because sooner or later, I know I will be in a situation where I could need it. I will try to edit this answer and include the suggestions in a structured manners. A link for financial data was hidden in that readwriteweb article, doh! It's called opentick.com. Looks good so far! Update I stumbled over semantic data in another question of mine on here. There is opencyc ('the world's largest and most complete general knowledge base and commonsense reasoning engine'). A project called UMBEL provides a light-weight, distilled version of opencyc. Umbel has semantic data in rdf/owl/skos n3 syntax. The Worldbank also released a very nice API. It offers data from the last 50 years for about 200 countries

Read the article

How to know if two stocks move togheter?

- by Damiano

Hello, I have two stocks with their prices, example: STOCK1: 10.56 11.23 12.32 8.90 STOCK2: 1.26 5.80 3.26 10.3 I only found Pearson correlation, but, is there another method to know if two stocks move togheter? (esample: co-integration??) Thank you so much!

Read the article

Latent Dirichlet Allocation, pitfalls, tips and programs

- by Gregg Lind

I'm experimenting with Latent Dirichlet Allocation for topic disambiguation and assignment, and I'm looking for advice. Which program is the "best", where best is some combination of easiest to use, best prior estimation, fast How do I incorporate my intuitions about topicality. Let's say I think I know that some items in the corpus are really in the same category, like all articles by the same author. Can I add that into the analysis? Any unexpected pitfalls or tips I should know before embarking? I'd prefer is there are R or Python front ends for whatever program, but I expect (and accept) that I'll be dealing with C.

Read the article

Student's t distribution in JavaScript

- by Sai Emrys

Google Spreadsheets currently does not support the standard function TDIST - i.e. the Student's t-distribution. This function is critical for calculating p-values. It seems that this is related to the fact that no integral-using functions (AFAICT) are implemented either. However, Google Docs allows people to add and publish their own scripts, in JavaScript. So ideally we should have something like: function tdist(t_value, degrees_of_freedom, two_tailed [defaults true]) {...} Anyone know of either an extant implementation of this (my google-fu has not turned up one, but may be weaker than yours) or a good idea for how to do it? I'd like to publish this together with some other useful functions that are currently calculable but a bit of a pain (like Student's t-test itself). Thanks!

Read the article

Non-linear regression models in PostgreSQL using R

- by Dave Jarvis

Background I have climate data (temperature, precipitation, snow depth) for all of Canada between 1900 and 2009. I have written a basic website and the simplest page allows users to choose category and city. They then get back a very simple report (without the parameters and calculations section): The primary purpose of the web application is to provide a simple user interface so that the general public can explore the data in meaningful ways. (A list of numbers is not meaningful to the general public, nor is a website that provides too many inputs.) The secondary purpose of the application is to provide climatologists and other scientists with deeper ways to view the data. (Using too many inputs, of course.) Tool Set The database is PostgreSQL with R (mostly) installed. The reports are written using iReport and generated using JasperReports. Poor Model Choice Currently, a linear regression model is applied against annual averages of daily data. The linear regression model is calculated within a PostgreSQL function as follows: SELECT regr_slope( amount, year_taken ), regr_intercept( amount, year_taken ), corr( amount, year_taken ) FROM temp_regression INTO STRICT slope, intercept, correlation; The results are returned to JasperReports using: SELECT year_taken, amount, year_taken * slope + intercept, slope, intercept, correlation, total_measurements INTO result; JasperReports calls into PostgreSQL using the following parameterized analysis function: SELECT year_taken, amount, measurements, regression_line, slope, intercept, correlation, total_measurements, execute_time FROM climate.analysis( $P{CityId}, $P{Elevation1}, $P{Elevation2}, $P{Radius}, $P{CategoryId}, $P{Year1}, $P{Year2} ) ORDER BY year_taken This is not an optimal solution because it gives the false impression that the climate is changing at a slow, but steady rate. Questions Using functions that take two parameters (e.g., year [X] and amount [Y]), such as PostgreSQL's regr_slope: What is a better regression model to apply? What CPAN-R packages provide such models? (Installable, ideally, using apt-get.) How can the R functions be called within a PostgreSQL function? If no such functions exist: What parameters should I try to obtain for functions that will produce the desired fit? How would you recommend showing the best fit curve? Keep in mind that this is a web app for use by the general public. If the only way to analyse the data is from an R shell, then the purpose has been defeated. (I know this is not the case for most R functions I have looked at so far.) Thank you!

Read the article

SQL Server - Schema/Code Analysis Rules - What would your rules include?

- by Randy Minder

We're using Visual Studio Database Edition (DBPro) to manage our schema. This is a great tool that, among the many things it can do, can analyse our schema and T-SQL code based on rules (much like what FxCop does with C# code), and flag certain things as warnings and errors. Some example rules might be that every table must have a primary key, no underscore's in column names, every stored procedure must have comments etc. The number of rules built into DBPro is fairly small, and a bit odd. Fortunately DBPro has an API that allows the developer to create their own. I'm curious as to the types of rules you and your DB team would create (both schema rules and T-SQL rules). Looking at some of your rules might help us decide what we should consider. Thanks - Randy

Read the article

Is there a Pair-Wise PostHoc Comparisons for the Chi-Square Test in R?

- by Tal Galili

Hi all, I am wondering if there exists in R a package/function to perform the: "Post Hoc Pair-Wise Comparisons for the Chi-Square Test of Homogeneity of Proportions" (or an equivalent of it) Which is described here: http://epm.sagepub.com/cgi/content/abstract/53/4/951 My situation is of just making a chi test, on a 2 by X matrix. I found a difference, but I want to know which of the columns is "responsible" for the difference. Thanks, Tal

Read the article

redirect user, then log his visit using php and mysql

- by Bart van Heukelom

I have a PHP redirect page to track clicks on links. Basically it does: - get url from $_GET - connect to database - create row for url, or update with 1 hit if it exists - redirect browser to url using Location: header I was wondering if it's possible to send the redirect to the client first, so it can get on with it's job, and then log the click. Would simply switching things around work?

Read the article

probability and relative frequency

- by Alexandru

If I use relative frequency to estimate the probability of an event, how good is my estimate based on the number of experiments? Is standard deviation a good measure? A paper/link/online book would be perfect. http://en.wikipedia.org/wiki/Frequentist

Read the article

Is there a source insight like code reader on Mac?

- by al_lea

Or what app do you use to read code on Mac?

Read the article

Regressing panel data in SAS.

- by John

Hey Guys, thanks to your help I succesfully managed all my databases! I am now looking at a panel data set on which I have to regress. Since I only started my Phd this semester together with the econometrics courses I am still new to many statistic applications and regression methods. I want to do a simple regression as in Y = x1 x2 x3 etc, now I already browsed through some literature and found that for panel data it's common to do a fixed effects regression. Also, my Y variable only has positive values so I was thinking in the direction of a Tobit model? I'm doing some research concerning the coverage of analysts in the financial business. My independent variable is the coverage of analysts on a certain firm, so per observation i have 1 analyst and 1 firm, together with different characteristics(market cap and betas etc) of the firm. All this data is monthly. As coverage cannot become negative (only 0) I was thinking of a Tobit model? Do you guys have any ideas what would be a good regression method? Or have some good sources (e books, written books, through university I have access to almost anything concerning my field of work) of information (cause I do have to learn these things for future research)? Thanks!

Read the article

Multiple outliers for two variable linear regression

- by Dave Jarvis

Problem Building on my previous question, the "extreme" outliers in the following graph are somewhat obvious: Question Given: T - Set of all temperatures Y - Set of all years ST - Sum of temperatures. SY - Sum of years. N - Number of elements T(n) - Temperature of the nth element in the temperature set How would you implement an efficient MySQL stored procedure or user-defined function (UDF) to determine if T(n) is an outlier? (If such an implementation already exists, that would be good to know as well.) Related Sites I am slowly working through these sites to get a better understanding of the problem: Multiple Outliers Detection Procedures in Linear Regression M-estimator Measure of Surprise for Outlier Detection Ordinary Least Squares Linear Regression Many thanks!

Read the article

Generation of an array with defined Min, Max, Mean and Stdev with given number of elements and error

- by Viet

I'd like to generate an array with defined Min, Max, Mean and Stdev with given number of elements and error level. Is there such a library in C, C++, PHP or Python to do so? Please kindly advise. Thanks!

Read the article

A CSS code to put text saved in other server into blogspot blog..???

- by Nok Imchen

I have a blog hosted on blogspot dot com. In that blog, i want to put some data like Google search string or etc automatically. I want it to be done in this way: Just put a code (server side scripting) linking to a text file or PHP file, and the code will extract the text and output in my blogspot blog. What i DONT want is to use javascript. Beacause, if i use javascript then the output will be seen only in the users screen. I want the output to be seen by Google Bot too. Thanking you in anticipation.

Read the article

How is the iPad going to be classified - as a mobile platform or a desktop platform?

- by Tony Eichelberger

I sometimes use the following site to look at browser and OS trends http://gs.statcounter.com/. It got me thinking about how the iPad is going to be classified, as a mobile platform or a desktop platform, or is it going to spark a new category. Since it runs iPhone OS, it could be considered a mobile device, but I have a hard time with that because of the screen size. What should iPad be classified as: Mobile, Desktop, or Other (Try to come up with a good name for Other)?

Read the article

What is the difference between Multiple R-squared and Adjusted R-squared in a single-variate least s

- by fmark

Could someone explain to the statistically naive what the difference between Multiple R-squared and Adjusted R-squared is? I am doing a single-variate regression analysis as follows: v.lm <- lm(epm ~ n_days, data=v) print(summary(v.lm)) Results: Call: lm(formula = epm ~ n_days, data = v) Residuals: Min 1Q Median 3Q Max -693.59 -325.79 53.34 302.46 964.95 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2550.39 92.15 27.677 <2e-16 *** n_days -13.12 5.39 -2.433 0.0216 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 410.1 on 28 degrees of freedom Multiple R-squared: 0.1746, Adjusted R-squared: 0.1451 F-statistic: 5.921 on 1 and 28 DF, p-value: 0.0216 Apologies for the newbiness of this question.

Read the article

Get percentiles of data-set with group by month

- by Cylindric

Hello, I have a SQL table with a whole load of records that look like this: | Date | Score | + -----------+-------+ | 01/01/2010 | 4 | | 02/01/2010 | 6 | | 03/01/2010 | 10 | ... | 16/03/2010 | 2 | I'm plotting this on a chart, so I get a nice line across the graph indicating score-over-time. Lovely. Now, what I need to do is include the average score on the chart, so we can see how that changes over time, so I can simply add this to the mix: SELECT YEAR(SCOREDATE) 'Year', MONTH(SCOREDATE) 'Month', MIN(SCORE) MinScore, AVG(SCORE) AverageScore, MAX(SCORE) MaxScore FROM SCORES GROUP BY YEAR(SCOREDATE), MONTH(SCOREDATE) ORDER BY YEAR(SCOREDATE), MONTH(SCOREDATE) That's no problem so far. The problem is, how can I easily calculate the percentiles at each time-period? I'm not sure that's the correct phrase. What I need in total is: A line on the chart for the score (easy) A line on the chart for the average (easy) A line on the chart showing the band that 95% of the scores occupy (stumped) It's the third one that I don't get. I need to calculate the 5% percentile figures, which I can do singly: SELECT MAX(SubQ.SCORE) FROM (SELECT TOP 45 PERCENT SCORE FROM SCORES WHERE YEAR(SCOREDATE) = 2010 AND MONTH(SCOREDATE) = 1 ORDER BY SCORE ASC) AS SubQ SELECT MIN(SubQ.SCORE) FROM (SELECT TOP 45 PERCENT SCORE FROM SCORES WHERE YEAR(SCOREDATE) = 2010 AND MONTH(SCOREDATE) = 1 ORDER BY SCORE DESC) AS SubQ But I can't work out how to get a table of all the months. | Date | Average | 45% | 55% | + -----------+---------+-----+-----+ | 01/01/2010 | 13 | 11 | 15 | | 02/01/2010 | 10 | 8 | 12 | | 03/01/2010 | 5 | 4 | 10 | ... | 16/03/2010 | 7 | 7 | 9 | At the moment I'm going to have to load this lot up into my app, and calculate the figures myself. Or run a larger number of individual queries and collate the results.

Read the article

best way to statistically detect anomalies in data

- by reinier

Hi, our webapp collects huge amount of data about user actions, network business, database load, etc etc etc All data is stored in warehouses and we have quite a lot of interesting views on this data. if something odd happens chances are, it shows up somewhere in the data. However, to manually detect if something out of the ordinary is going on, one has to continually look through this data, and look for oddities. My question: what is the best way to detect changes in dynamic data which can be seen as 'out of the ordinary'. Are bayesan filters (I've seen these mentioned when reading about spam detection) the way to go? Any pointers would be great! EDIT: To clarify the data for example shows a daily curve of database load. This curve typically looks similar to the curve from yesterday In time this curve might change slowly. It would be nice that if the curve from day to day changes say within some perimeters, a warning could go off. R

Read the article

How do I get code coverage of Perl cgi script when executed by Selenium?

- by Kurt W. Leucht

I'm using Eclipse EPIC IDE to write some Perl cgi scripts which call some Perl modules that I have also written. The EPIC IDE lets me configure a Perl CGI "run configuration" which runs my CGI script. And then I've got Selenium set up and one of my unit test files runs some Selenium commands to run my cgi script through its paces. But the coverage report from Module::Build dispatch 'testcover' doesn't show that any of my module code has been executed. It's been executed by my cgi script, but I guess the CGI script was run manually and was not executed directly by my unit test file, so maybe that's why the coverage isn't being recognized. Is there a way to do this right so I can integrate Selenium and unit test files and code coverage all together somehow?

Read the article

Probability distribution for sms answer delays

- by Thomas Ahle

I'm writing an app using sms as communication. I have chosen to subscribe to an sms-gateway, which provides me with an API for doing so. The API has functions for sending as well as pulling new messages. It does however not have any kind of push functionality. In order to do my queries most efficient, I'm seeking data on how long time people wait before they answer a text message - as a probability function. Extra info: The application is interactive (as can be), so I suppose the times will be pretty similar to real life human-human communication. I don't believe differences in personal style will play a big impact on the right times and frequencies to query, so average data should be fine.

Read the article

postgresql weighted average?

- by milovanderlinden

say I have a postgresql table with the following values: id | value ---------- 1 | 4 2 | 8 3 | 100 4 | 5 5 | 7 If I use postgresql to calculate the average, it gives me an average of 24.8 because the high value of 100 has great impact on the calculation. While in fact I would like to find an average somewhere around 6 and eliminate the extreme(s). I am looking for a way to eliminate extremes and want to do this "statistically correct". The extreme's cannot be fixed. I cannot say; If a value is over X, it has to be eliminated. I have been bending my head on the postgresql aggregate functions but cannot put my finger on what is right for me to use. Any suggestions?

Read the article

MySQL Volleyball Standings

- by Torez

I have a database table full of game by game results and want to know if I can calculate the following: GP (games played) Wins Loses Points (2 points for each win, 1 point for each lose) Here is my table structure: CREATE TABLE `results` ( `id` int(10) unsigned NOT NULL auto_increment, `home_team_id` int(10) unsigned NOT NULL, `home_score` int(3) unsigned NOT NULL, `visit_team_id` int(10) unsigned NOT NULL, `visit_score` int(3) unsigned NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=7 ; And a few testing results: INSERT INTO `results` VALUES(1, 1, 21, 2, 25); INSERT INTO `results` VALUES(2, 3, 21, 4, 17); INSERT INTO `results` VALUES(3, 1, 25, 3, 9); INSERT INTO `results` VALUES(4, 2, 7, 4, 22); INSERT INTO `results` VALUES(5, 1, 19, 4, 20); INSERT INTO `results` VALUES(6, 2, 24, 3, 26); Here is what a final table would look like: +-------------------+----+------+-------+--------+ | Team Name | GP | Wins | Loses | Points | +-------------------+----+------+-------+--------+ | Spikers | 4 | 4 | 0 | 8 | | Leapers | 4 | 2 | 2 | 6 | | Ground Control | 4 | 1 | 3 | 5 | | Touch Guys | 4 | 0 | 4 | 4 | +-------------------+----+------+-------+--------+

Read the article

How can I graph the Lines of Code history for git repo?

- by dbr

Basically I want to get the number of lines-of-code in the repository after each commit. The only (really crappy) ways I have found is to use git filter-branch to run "wc -l *", and a script that run git reset --hard on each commit, then ran wc -l To make it a bit clearer, when the tool is run, it would output the lines of code of the very first commit, then the second and so on.. This is what I want the tool to output (as an example): me@something:~/$ gitsloc --branch master 10 48 153 450 1734 1542 I've played around with the ruby 'git' library, but the closest I found was using the .lines() method on a diff, which seems like it should give the added lines (but does not.. it returns 0 when you delete lines for example) require 'rubygems' require 'git' total = 0 g = Git.open(working_dir = '/Users/dbr/Desktop/code_projects/tvdb_api') last = nil g.log.each do |cur| diff = g.diff(last, cur) total = total + diff.lines puts total last = cur end

Search Results

Search found 89214 results on 3569 pages for 'code statistics'.

Page 73/3569 | < Previous Page | 69 70 71 72 73 74 75 76 77 78 79 80 | Next Page >

- by Michael

- by TomTom

- by kitsune

- by Damiano

- by Gregg Lind

- by Sai Emrys

- by Dave Jarvis

- by Randy Minder

- by Tal Galili

- by Bart van Heukelom

- by Alexandru

- by al_lea

- by John

- by Dave Jarvis

- by Viet

- by Nok Imchen

- by Tony Eichelberger

- by fmark

- by Cylindric

- by reinier

- by Kurt W. Leucht

- by Thomas Ahle

- by milovanderlinden

- by Torez

- by dbr

< Previous Page | 69 70 71 72 73 74 75 76 77 78 79 80 | Next Page >