statistical analysis - Page 98

Please give the solution of the following programs in R Programming

- by NEETHU

Table below gives data concerning the performance of 28 national football league teams in 1976.It is suspected that the no. of yards gained rushing by opponents(x8) has an effect on the no. of games won by a team(y) (a)Fit a simple linear regression model relating games won by y to yards gained rushing by opponents x8. (b)Construct the analysis of variance table and test for significance of regression. (c)Find a 95% CI on the slope. (d)What percent of the total variability in y is explained by this model. (e)Find a 95% CI on the mean number of games won in opponents yards rushing is limited to 2000 yards. Team y x8 1 10 2205 2 11 2096 3 11 1847 4 13 1803 5 10 1457 6 11 1848 7 10 1564 8 11 1821 9 4 2577 10 2 2476 11 7 1984 12 10 1917 13 9 1761 14 9 1709 15 6 1901 16 5 2288 17 5 2072 18 5 2861 19 6 2411 20 4 2289 21 3 2203 22 3 2592 23 4 2053 24 10 1979 25 6 2048 26 8 1786 27 2 2876 28 0 2560 Suppose we would like to use the model developed in problem 1 to predict the no. of games a team will win if it can limit opponents yards rushing to 1800 yards. Find a point estimate of the no. of games won when x8=1800.Find a 905 prediction interval on the no. of games won. The purity of Oxygen produced by a fractionation process is thought to be percentage of Hydrocarbon in the main condenser of the processing unit .20 samples are shown below. Purity(%) Hydrocarbon(%) 86.91 1.02 89.85 1.11 90.28 1.43 86.34 1.11 92.58 1.01 87.33 0.95 86.29 1.11 91.86 0.87 95.61 1.43 89.86 1.02 96.73 1.46 99.42 1.55 98.66 1.55 96.07 1.55 93.65 1.4 87.31 1.15 95 1.01 96.85 0.99 85.2 0.95 90.56 0.98 (a)Fit a simple linear regression model to the data. (b)Test the hypothesis H0:ß=0 (c)Calculate R2 . (d)Find a 95% CI on the slope. (e)Find a 95% CI on the mean purity and the Hydrocarbon % is 1. Consider the Oxygen plant data in Problem3 and assume that purity and Hydrocarbon percentage are jointly normally distributed r.vs (a)What is the correlation between Oxygen purity and Hydrocarbon% (b)Test the hypothesis that ?=0. (c)Construct a 95% CI for ?.

Read the article

Objective-C wrapper API design methodology

- by Wade Williams

I know there's no one answer to this question, but I'd like to get people's thoughts on how they would approach the situation. I'm writing an Objective-C wrapper to a C library. My goals are: 1) The wrapper use Objective-C objects. For example, if the C API defines a parameter such as char *name, the Objective-C API should use name:(NSString *). 2) The client using the Objective-C wrapper should not have to have knowledge of the inner-workings of the C library. Speed is not really any issue. That's all easy with simple parameters. It's certainly no problem to take in an NSString and convert it to a C string to pass it to the C library. My indecision comes in when complex structures are involved. Let's say you have: struct flow { long direction; long speed; long disruption; long start; long stop; } flow_t; And then your C API call is: void setFlows(flow_t inFlows[4]); So, some of the choices are: 1) expose the flow_t structure to the client and have the Objective-C API take an array of those structures 2) build an NSArray of four NSDictionaries containing the properties and pass that as a parameter 3) create an NSArray of four "Flow" objects containing the structure's properties and pass that as a parameter My analysis of the approaches: Approach 1: Easiest. However, it doesn't meet the design goals Approach 2: For some reason, this seems to me to be the most "Objective-C" way of doing it. However, each element of the NSDictionary would have to be wrapped in an NSNumber. Now it seems like we're doing an awful lot just to pass the equivalent of a struct. Approach 3: Seems the cleanest to me from an object-oriented standpoint and the extra encapsulation could come in handy later. However, like #2, it now seems like we're doing an awful lot (creating an array, creating and initializing objects) just to pass a struct. So, the question is, how would you approach this situation? Are there other choices I'm not considering? Are there additional advantages or disadvantages to the approaches I've presented that I'm not considering?

Read the article

Setting up NCover for NUnit in FinalBuilder

- by Lasse V. Karlsen

I am attempting to set up NCover for usage in my FinalBuilder project, for a .NET 4.0 C# project, but my final coverage output file contains no coverage data. I am using: NCover 3.3.2 NUnit 2.5.4 FinalBuilder 6.3.0.2004 All tools are the latest official as of today. I've finally managed to coax FB into running my unit tests under NCover for the .NET 4.0 project, so I get Tests run: 184, ..., which is correct. However, the final Coverage.xml file output from NCover is almost empty, and looks like this: <?xml version="1.0" encoding="utf-8"?>  <coverage profilerVersion="3.3.2.6211" driverVersion="3.3.2" exportversion="3" viewdisplayname="" startTime="2010-04-22T08:55:33.7471316Z" measureTime="2010-04-22T08:55:35.3462915Z" projectName="" buildid="27c78ffa-c636-4002-a901-3211a0850b99" coveragenodeid="0" failed="false" satisfactorybranchthreshold="95" satisfactorycoveragethreshold="95" satisfactorycyclomaticcomplexitythreshold="20" satisfactoryfunctionthreshold="80" satisfactoryunvisitedsequencepoints="10" uiviewtype="TreeView" viewguid="C:\Dev\VS.NET\LVK.IoC\LVK.IoC.Tests\bin\Debug\Coverage.xml" viewfilterstyle="None" viewreportstyle="SequencePointCoveragePercentage" viewsortstyle="Name"> <rebasedpaths /> <filters /> <documents> <doc id="0" excluded="false" url="None" cs="" csa="00000000-0000-0000-0000-000000000000" om="0" nid="0" /> </documents> </coverage> The output in FB log is: ... ***************** End Program Output ***************** Execution Time: 1,5992 s Coverage Xml: C:\Dev\VS.NET\LVK.IoC\LVK.IoC.Tests\bin\Debug\Coverage.xml NCover Success My configuration of the FB step for NCover: NCover what?: NUnit test coverage Command: C:\Program Files (x86)\NUnit 2.5.4\bin\net-2.0\nunit-console.exe Command arguments: LVK.IoC.Tests.dll /noshadow /framework:4.0.30319 /process=single /nothread Note: I've tried with and without the /process and /nothread options Working directory: %FBPROJECTDIR%\LVK.IoC.Tests\bin\Debug List of assemblies to profile: %FBPROJECTDIR%\LVK.IoC.Tests\bin\Debug\LVK.IoC.dll Note: I've tried just listing the name of the assembly, both with and without the extension. The documentation for the FB step doesn't help, as it only lists minor sentences for each property, and fails to give examples or troubleshooting hints. Since I want to pull the coverage results into NDepend to run build-time analysis, I want that file to contain the information I need. I am also using TestDriven, and if I right-click the solution file and select "Test with NCover", NCover-explorer opens up with coverage data, and if I ask it to show me the folder with coverage files, in there is an .xml file with the same structure as the one above, just with all the data that should be there, so the tools I have is certainly capable of producing it. Has anyone an idea of what I've configured wrong here?

Read the article

.NET RegEx "Memory Leak" investigation

- by Kevin Pullin

I recently looked into some .NET "memory leaks" (i.e. unexpected, lingering GC rooted objects) in a WinForms app. After loading and then closing a huge report, the memory usage did not drop as expected even after a couple of gen2 collections. Assuming that the reporting control was being kept alive by a stray event handler I cracked open WinDbg to see what was happening... Using WinDbg, the !dumpheap -stat command reported a large amount of memory was consumed by string instances. Further refining this down with the !dumpheap -type System.String command I found the culprit, a 90MB string used for the report, at address 03be7930. The last step was to invoke !gcroot 03be7930 to see which object(s) were keeping it alive. My expectations were incorrect - it was not an unhooked event handler hanging onto the reporting control (and report string), but instead it was held on by a System.Text.RegularExpressions.RegexInterpreter instance, which itself is a descendant of a System.Text.RegularExpressions.CachedCodeEntry. Now, the caching of Regexs is (somewhat) common knowledge as this helps to reduce the overhead of having to recompile the Regex each time it is used. But what then does this have to do with keeping my string alive? Based on analysis using Reflector, it turns out that the input string is stored in the RegexInterpreter whenever a Regex method is called. The RegexInterpreter holds onto this string reference until a new string is fed into it by a subsequent Regex method invocation. I'd expect similar behaviour by hanging onto Regex.Match instances and perhaps others. The chain is something like this: Regex.Split, Regex.Match, Regex.Replace, etc Regex.Run RegexScanner.Scan (RegexScanner is the base class, RegexInterpreter is the subclass described above). The offending Regex is only used for reporting, rarely used, and therefore unlikely to be used again to clear out the existing report string. And even if the Regex was used at a later point, it would probably be processing another large report. This is a relatively significant problem and just plain feels dirty. All that said, I found a few options on how to resolve, or at least work around, this scenario. I'll let the community respond first and if no takers come forward I will fill in any gaps in a day or two.

Read the article

Modeling distribution of performance measurements

- by peterchen

How would you mathematically model the distribution of repeated real life performance measurements - "Real life" meaning you are not just looping over the code in question, but it is just a short snippet within a large application running in a typical user scenario? My experience shows that you usually have a peak around the average execution time that can be modeled adequately with a Gaussian distribution. In addition, there's a "long tail" containing outliers - often with a multiple of the average time. (The behavior is understandable considering the factors contributing to first execution penalty). My goal is to model aggregate values that reasonably reflect this, and can be calculated from aggregate values (like for the Gaussian, calculate mu and sigma from N, sum of values and sum of squares). In other terms, number of repetitions is unlimited, but memory and calculation requirements should be minimized. A normal Gaussian distribution can't model the long tail appropriately and will have the average biased strongly even by a very small percentage of outliers. I am looking for ideas, especially if this has been attempted/analysed before. I've checked various distributions models, and I think I could work out something, but my statistics is rusty and I might end up with an overblown solution. Oh, a complete shrink-wrapped solution would be fine, too ;) Other aspects / ideas: Sometimes you get "two humps" distributions, which would be acceptable in my scenario with a single mu/sigma covering both, but ideally would be identified separately. Extrapolating this, another approach would be a "floating probability density calculation" that uses only a limited buffer and adjusts automatically to the range (due to the long tail, bins may not be spaced evenly) - haven't found anything, but with some assumptions about the distribution it should be possible in principle. Why (since it was asked) - For a complex process we need to make guarantees such as "only 0.1% of runs exceed a limit of 3 seconds, and the average processing time is 2.8 seconds". The performance of an isolated piece of code can be very different from a normal run-time environment involving varying levels of disk and network access, background services, scheduled events that occur within a day, etc. This can be solved trivially by accumulating all data. However, to accumulate this data in production, the data produced needs to be limited. For analysis of isolated pieces of code, a gaussian deviation plus first run penalty is ok. That doesn't work anymore for the distributions found above. [edit] I've already got very good answers (and finally - maybe - some time to work on this). I'm starting a bounty to look for more input / ideas.

Read the article

Lucene: Wildcards are missing from index

- by Eleasar

Hi - i am building a search index that contains special names - containing ! and ? and & and + and ... I have to tread the following searches different: me & you me + you But whatever i do (did try with queryparser escaping before indexing, escaped it manually, tried different indexers...) - if i check the search index with Luke they do not show up (question marks and @-symbols and the like show up) The logic behind is that i am doing partial searches for a live suggestion (and the fields are not that large) so i split it up into "m" and "me" and "+" and "y" and "yo" and "you" and then index it (that way it is way faster than a wildcard query search (and the index size is not a big problem). So what i would need is to also have this special wildcard characters be inserted into the index. This is my code: using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; using Lucene.Net.Analysis; using Lucene.Net.Util; namespace AnalyzerSpike { public class CustomAnalyzer : Analyzer { public override TokenStream TokenStream(string fieldName, TextReader reader) { return new ASCIIFoldingFilter(new LowerCaseFilter(new CustomCharTokenizer(reader))); } } public class CustomCharTokenizer : CharTokenizer { public CustomCharTokenizer(TextReader input) : base(input) { } public CustomCharTokenizer(AttributeSource source, TextReader input) : base(source, input) { } public CustomCharTokenizer(AttributeFactory factory, TextReader input) : base(factory, input) { } protected override bool IsTokenChar(char c) { return c != ' '; } } } The code to create the index: private void InitIndex(string path, Analyzer analyzer) { var writer = new IndexWriter(path, analyzer, true); //some multiline textbox that contains one item per line: var all = new List<string>(txtAllAvailable.Text.Replace("\r","").Split('\n')); foreach (var item in all) { writer.AddDocument(GetDocument(item)); } writer.Optimize(); writer.Close(); } private static Document GetDocument(string name) { var doc = new Document(); doc.Add(new Field( "name", DeNormalizeName(name), Field.Store.YES, Field.Index.ANALYZED)); doc.Add(new Field( "raw_name", name, Field.Store.YES, Field.Index.NOT_ANALYZED)); return doc; } (Code is with Lucene.net in version 1.9.x (EDIT: sorry - was 2.9.x) but is compatible with Lucene from Java) Thx

Read the article

MemoryStream, XmlTextWriter and Warning 4 CA2202 : Microsoft.Usage

- by rasx

The Run Code Analysis command in Visual Studio 2010 Ultimate returns a warning when seeing a certain pattern with MemoryStream and XmlTextWriter. This is the warning: Warning 7 CA2202 : Microsoft.Usage : Object 'ms' can be disposed more than once in method 'KinteWritePages.GetXPathDocument(DbConnection)'. To avoid generating a System.ObjectDisposedException you should not call Dispose more than one time on an object.: Lines: 421 C:\Visual Studio 2010\Projects\Songhay.DataAccess.KinteWritePages\KinteWritePages.cs 421 Songhay.DataAccess.KinteWritePages This is the form: static XPathDocument GetXPathDocument(DbConnection connection) { XPathDocument xpDoc = null; var ms = new MemoryStream(); try { using(XmlTextWriter writer = new XmlTextWriter(ms, Encoding.UTF8)) { using(DbDataReader reader = CommonReader.GetReader(connection, Resources.KinteRssSql)) { writer.WriteStartDocument(); writer.WriteStartElement("data"); do { while(reader.Read()) { writer.WriteStartElement("item"); for(int i = 0; i < reader.FieldCount; i++) { writer.WriteRaw(String.Format("<{0}>{1}</{0}>", reader.GetName(i), reader[i].ToString())); } writer.WriteFullEndElement(); } } while(reader.NextResult()); writer.WriteFullEndElement(); writer.WriteEndDocument(); writer.Flush(); ms.Position = 0; xpDoc = new XPathDocument(ms); } } } finally { ms.Dispose(); } return xpDoc; } The same kind of warning is produced for this form: XPathDocument xpDoc = null; using(var ms = new MemoryStream()) { using(XmlTextWriter writer = new XmlTextWriter(ms, Encoding.UTF8)) { using(DbDataReader reader = CommonReader.GetReader(connection, Resources.KinteRssSql)) { //... } } } return xpDoc; By the way, the following form produces another warning: XPathDocument xpDoc = null; var ms = new MemoryStream(); using(XmlTextWriter writer = new XmlTextWriter(ms, Encoding.UTF8)) { using(DbDataReader reader = CommonReader.GetReader(connection, Resources.KinteRssSql)) { //... } } return xpDoc; The above produces the warning: Warning 7 CA2000 : Microsoft.Reliability : In method 'KinteWritePages.GetXPathDocument(DbConnection)', object 'ms' is not disposed along all exception paths. Call System.IDisposable.Dispose on object 'ms' before all references to it are out of scope. C:\Visual Studio 2010\Projects\Songhay.DataAccess.KinteWritePages\KinteWritePages.cs 383 Songhay.DataAccess.KinteWritePages In addition to the following, what are my options?: Supress warning CA2202. Supress warning CA2000 and hope that Microsoft is disposing of MemoryStream (because Reflector is not showing me the source code). Rewrite my legacy code to recognize the wonderful XDocument and LINQ to XML.

Read the article

Debugging site written mainly in JScript with AJAX code injection

- by blumidoo

Hello, I have a legacy code to maintain and while trying to understand the logic behind the code, I have run into lots of annoying issues. The application is written mainly in Java Script, with extensive usage of jQuery + different plugins, especially Accordion. It creates a wizard-like flow, where client code for the next step is downloaded in the background by injecting a result of a remote AJAX request. It also uses callbacks a lot and pretty complicated "by convention" programming style (lots of events handlers are created on the fly based on certain object names - e.g. current page name, current step name). Adding to that, the code is very messy and there is no obvious inner structure - the functions are scattered in the code, file names do not reflect the business role of the code, lots of functions and code snippets are most likely not used at all etc. PROBLEM: How to approach this code base, so that the inner flow of the code can be sort-of "reverse engineered" using a suite of smart debugging tools. Ideally, I would like to be able to attach to the running application and step through the code, breaking on each new function call. Also, it would be nice to be able to create a "diagram of calls" in the application (i.e. in order to run a particular page logic, this particular flow of function calls was executed in a particular order). Not to mention to be able to run a coverage analysis, identifying potentially orphaned code fragments. I would like to stress out once more, that it is impossible to understand the inner logic of the application just by looking at the code itself, unless you have LOTS of spare time and beer crates, which I unfortunately do not have :/ (shame...) An IDE of some sort that would aid in extending that code would be also great, but I am currently looking into possibility to use Visual Studio 2010 to do the job, as the site itself is a mix of Classic ASP and ASP.NET (I'd say - 70% Java Script with jQuery, 30% ASP). I have obviously tried FireBug, but I was unable to find a way to define a breakpoint or step into the code, which is "injected" into the client JS using AJAX calls (i.e. the application retrieves the code by invoking an URL and injects it to the client local code). Venkman debugger had similar issues. Any hints would be welcome. Feel free to ask additional questions.

Read the article

Count number of queries executed by NHibernate in a unit test

- by Bittercoder

In some unit/integration tests of the code we wish to check that correct usage of the second level cache is being employed by our code. Based on the code presented by Ayende here: http://ayende.com/Blog/archive/2006/09/07/MeasuringNHibernatesQueriesPerPage.aspx I wrote a simple class for doing just that: public class QueryCounter : IDisposable { CountToContextItemsAppender _appender; public int QueryCount { get { return _appender.Count; } } public void Dispose() { var logger = (Logger) LogManager.GetLogger("NHibernate.SQL").Logger; logger.RemoveAppender(_appender); } public static QueryCounter Start() { var logger = (Logger) LogManager.GetLogger("NHibernate.SQL").Logger; lock (logger) { foreach (IAppender existingAppender in logger.Appenders) { if (existingAppender is CountToContextItemsAppender) { var countAppender = (CountToContextItemsAppender) existingAppender; countAppender.Reset(); return new QueryCounter {_appender = (CountToContextItemsAppender) existingAppender}; } } var newAppender = new CountToContextItemsAppender(); logger.AddAppender(newAppender); logger.Level = Level.Debug; logger.Additivity = false; return new QueryCounter {_appender = newAppender}; } } public class CountToContextItemsAppender : IAppender { int _count; public int Count { get { return _count; } } public void Close() { } public void DoAppend(LoggingEvent loggingEvent) { if (string.Empty.Equals(loggingEvent.MessageObject)) return; _count++; } public string Name { get; set; } public void Reset() { _count = 0; } } } With intended usage: using (var counter = QueryCounter.Start()) { // ... do something Assert.Equal(1, counter.QueryCount); // check the query count matches our expectations } But it always returns 0 for Query count. No sql statements are being logged. However if I make use of Nhibernate Profiler and invoke this in my test case: NHibernateProfiler.Intialize() Where NHProf uses a similar approach to capture logging output from NHibernate for analysis via log4net etc. then my QueryCounter starts working. It looks like I'm missing something in my code to get log4net configured correctly for logging nhibernate sql ... does anyone have any pointers on what else I need to do to get sql logging output from Nhibernate?

Read the article

XmlDataProvider and XPath bindings don't allow default namespace of XML data?

- by Andy Dent

I am struggling to work out how to use default namespaces with XmlDataProvider and XPath bindings. There's an ugly answer using local-name <Binding XPath="*[local-name()='Name']" /> but that is not acceptable to the client who wants this XAML to be highly maintainable. The fallback is to force them to use non-default namespaces in the report XML but that is an undesirable solution. The XML report file looks like the following. It will only work if I remove xmlns="http://www.acme.com/xml/schemas/report so there is no default namespace. <?xml version="1.0" encoding="utf-8"?> <?xml-stylesheet type='text/xsl' href='PreviewReportImages.xsl'?> <Report xsl:schemaLocation="http://www.acme.com/xml/schemas/report BlahReport.xsd" xmlns:xsl="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.acme.com/xml/schemas/report"> <Service>Muncher</Service> <Analysis> <Date>27 Apr 2010</Date> <Time>0:09</Time> <Authoriser>Service Centre Manager</Authoriser> Which I am presenting in a window with XAML: <Window x:Class="AcmeTest.ReportPreview" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" Title="ReportPreview" Height="300" Width="300" > <Window.Resources> <XmlDataProvider x:Key="Data"/> </Window.Resources> <StackPanel Orientation="Vertical" DataContext="{Binding Source={StaticResource Data}, XPath=Report}"> <TextBlock Text="{Binding XPath=Service}"/> </StackPanel> </Window> with code-behind used to load an XmlDocument into the XmlDataProvider (seems the only way to have loading from a file or object varying at runtime). public partial class ReportPreview : Window { private void InitXmlProvider(XmlDocument doc) { XmlDataProvider xd = (XmlDataProvider)Resources["Data"]; xd.Document = doc; } public ReportPreview(XmlDocument doc) { InitializeComponent(); InitXmlProvider(doc); } public ReportPreview(String reportPath) { InitializeComponent(); var doc = new XmlDocument(); doc.Load(reportPath); InitXmlProvider(doc); } }

Read the article

.NET JIT Code Cache leaking?

- by pitchfork

We have a server component written in .Net 3.5. It runs as service on a Windows Server 2008 Standard Edition. It works great but after some time (days) we notice massive slowdowns and an increased working set. We expected some kind of memory leak and used WinDBG/SOS to analyze dumps of the process. Unfortunately the GC Heap doesn’t show any leak but we noticed that the JIT code heap has grown from 8MB after the start to more than 1GB after a few days. We don’t use any dynamic code generation techniques by our own. We use Linq2SQL which is known for dynamic code generation but we don’t know if it can cause such a problem. The main question is if there is any technique to analyze the dump and check where all this Host Code Heap blocks that are shown in the WinDBG dumps come from? [Update] In the mean time we did some more analysis and had Linq2SQL as probable suspect, especially since we do not use precompiled queries. The following example program creates exactly the same behaviour where more and more Host Code Heap blocks are created over time. using System; using System.Linq; using System.Threading; namespace LinqStressTest { class Program { static void Main(string[] args) { for (int i = 0; i < 100; ++ i) ThreadPool.QueueUserWorkItem(Worker); while(runs < 1000000) { Thread.Sleep(5000); } } static void Worker(object state) { for (int i = 0; i < 50; ++i) { using (var ctx = new DataClasses1DataContext()) { long id = rnd.Next(); var x = ctx.AccountNucleusInfos.Where(an => an.Account.SimPlayers.First().Id == id).SingleOrDefault(); } } var localruns = Interlocked.Add(ref runs, 1); System.Console.WriteLine("Action: " + localruns); ThreadPool.QueueUserWorkItem(Worker); } static Random rnd = new Random(); static long runs = 0; } } When we replace the Linq query with a precompiled one, the problem seems to disappear.

Read the article

R glm standard error estimate differences to SAS PROC GENMOD

- by Michelle

I am converting a SAS PROC GENMOD example into R, using glm in R. The SAS code was: proc genmod data=data0 namelen=30; model boxcoxy=boxcoxxy ~ AGEGRP4 + AGEGRP5 + AGEGRP6 + AGEGRP7 + AGEGRP8 + RACE1 + RACE3 + WEEKEND + SEQ/dist=normal; FREQ REPLICATE_VAR; run; My R code is: parmsg2 <- glm(boxcoxxy ~ AGEGRP4 + AGEGRP5 + AGEGRP6 + AGEGRP7 + AGEGRP8 + RACE1 + RACE3 + WEEKEND + SEQ , data=data0, family=gaussian, weights = REPLICATE_VAR) When I use summary(parmsg2) I get the same coefficient estimates as in SAS, but my standard errors are wildly different. The summary output from SAS is: Name df Estimate StdErr LowerWaldCL UpperWaldCL ChiSq ProbChiSq Intercept 1 6.5007436 .00078884 6.4991975 6.5022897 67911982 0 agegrp4 1 .64607262 .00105425 .64400633 .64813891 375556.79 0 agegrp5 1 .4191395 .00089722 .41738099 .42089802 218233.76 0 agegrp6 1 -.22518765 .00083118 -.22681672 -.22355857 73401.113 0 agegrp7 1 -1.7445189 .00087569 -1.7462352 -1.7428026 3968762.2 0 agegrp8 1 -2.2908855 .00109766 -2.2930369 -2.2887342 4355849.4 0 race1 1 -.13454883 .00080672 -.13612997 -.13296769 27817.29 0 race3 1 -.20607036 .00070966 -.20746127 -.20467944 84319.131 0 weekend 1 .0327884 .00044731 .0319117 .03366511 5373.1931 0 seq2 1 -.47509583 .00047337 -.47602363 -.47416804 1007291.3 0 Scale 1 2.9328613 .00015586 2.9325559 2.9331668 -127 The summary output from R is: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.50074 0.10354 62.785 < 2e-16 AGEGRP4 0.64607 0.13838 4.669 3.07e-06 AGEGRP5 0.41914 0.11776 3.559 0.000374 AGEGRP6 -0.22519 0.10910 -2.064 0.039031 AGEGRP7 -1.74452 0.11494 -15.178 < 2e-16 AGEGRP8 -2.29089 0.14407 -15.901 < 2e-16 RACE1 -0.13455 0.10589 -1.271 0.203865 RACE3 -0.20607 0.09315 -2.212 0.026967 WEEKEND 0.03279 0.05871 0.558 0.576535 SEQ -0.47510 0.06213 -7.646 2.25e-14 The importance of the difference in the standard errors is that the SAS coefficients are all statistically significant, but the RACE1 and WEEKEND coefficients in the R output are not. I have found a formula to calculate the Wald confidence intervals in R, but this is pointless given the difference in the standard errors, as I will not get the same results. Apparently SAS uses a ridge-stabilized Newton-Raphson algorithm for its estimates, which are ML. The information I read about the glm function in R is that the results should be equivalent to ML. What can I do to change my estimation procedure in R so that I get the equivalent coefficents and standard error estimates that were produced in SAS? To update, thanks to Spacedman's answer, I used weights because the data are from individuals in a dietary survey, and REPLICATE_VAR is a balanced repeated replication weight, that is an integer (and quite large, in the order of 1000s or 10000s). The website that describes the weight is here. I don't know why the FREQ rather than the WEIGHT command was used in SAS. I will now test by expanding the number of observations using REPLICATE_VAR and rerunning the analysis.

Read the article

Is there any "modern" text editor with command-line/minibuffer?

- by Pedro Morte Rolo

A command line in a text editor is a wonderful feature. It allows the user to explore the editor's functionality and learn it's shortcuts in a textual way. It's much faster than using the mouse, and it is much easier to memorise "shortcuts" this way. Emacs and VI provide this, though, emacs and vi are not "modern". By "modern", I mean one that is original built to cope with the modern de-facto standards of selecting, copying, pasting, cutting, undoing, redoing and auto-completing. Cream/vi or Emacs/CUA are not valid options, since there are loads of things built over them that conflict with the mentioned stuff. It would be nice if there was an editor that would cope with the modern de-facto standards out-off-the-box, but still provide a command-line/minibuffer to perform/explore the commands and learn its shortcuts. Is there such a thing? I do not intend to use the "modern" term as derrogatory. I love both Emacs and VI, but I hate their keyboard-shortcut historical baggage. When I reffer to de-facto standards, I am not talking about Windows vs Whatever. Kate, gedit, Eclipse, Intelij or Textmate also follow the norm I am talking about and are not Windows editors. Please do not advertise Vim and Emacs, that's not answering the question. I am asking for alternatives. Why don't I like emacs and vi: Emacs: Despite CUA mode, emacs has loads of modes that conflict with this (e.g. slime, ruby-mode, etc...) It would be nice to have something that would work out-off-the-box. VI: I do not like that it is Visual/Insert-based. I do not know how to browse the text-editor's commands. I do not like that it is so much tought for the terminal. I believe that it has the same problem that I mentioned for emacs. This question is starting to look like requirement analysis.. As de-facto standards I mean: Ctrol-XCV for cut-copy-paste Ctrol-A for select-all Contrl-Z for Undo Ctrol-Y for Redo Control-F for Searching Contrl-Space for auto-complete Shift-arrow for selection Control-arrow for word-navigation Alt-Arrow for moving

Read the article

G++, compiler warnings, c++ templates

- by Ian

During the compilatiion of the C++ program those warnings appeared: c:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/bc:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/bits/stl_algo.h:2317: instantiated from `void std::partial_sort(_RandomAccessIterator, _RandomAccessIterator, _RandomAccessIterator, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<Object<double>**, std::vector<Object<double>*, std::allocator<Object<double>*> > >, _Compare = sortObjects<double>]' c:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/bits/stl_algo.h:2506: instantiated from `void std::__introsort_loop(_RandomAccessIterator, _RandomAccessIterator, _Size, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<Object<double>**, std::vector<Object<double>*, std::allocator<Object<double>*> > >, _Size = int, _Compare = sortObjects<double>]' c:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/bits/stl_algo.h:2589: instantiated from `void std::sort(_RandomAccessIterator, _RandomAccessIterator, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<Object<double>**, std::vector<Object<double>*, std::allocator<Object<double>*> > >, _Compare = sortObjects<double>]' io/../structures/objects/../../algorithm/analysis/../../structures/list/ObjectsList.hpp:141: instantiated from `void ObjectsList <T>::sortObjects(unsigned int, T, T, T, T, unsigned int) [with T = double]' I do not why, because all objects have only template parameter T, their local variables are also T. The only place, where I am using double is main. There are objects of type double creating and adding into the ObjectsList... Object <double> o1; ObjectsList <double> olist; olist.push_back(o1); .... T xmin = ..., ymin = ..., xmax = ..., ymax = ...; unsigned int n = ...; olist.sortAllObjects(xmin, ymin, xmax, ymax, n); and comparator template <class T> class sortObjects { private: unsigned int n; T xmin, ymin, xmax, ymax; public: sortObjects ( const T xmin_, const T ymin_, const T xmax_, const T ymax_, const int n_ ) : xmin ( xmin_ ), ymin ( ymin_ ), xmax ( xmax_ ), ymax ( ymax_ ), n ( n_ ) {} bool operator() ( const Object <T> *o1, const Object <T> *o2 ) const { T dmax = (std::max) ( xmax - xmin, ymax - ymin ); T x_max = ( xmax - xmin ) / dmax; T y_max = ( ymax - ymin ) / dmax; ... return ....; } representing ObjectsList method: template <class T> void ObjectsList <T> ::sortAllObjects ( const T xmin, const T ymin, const T xmax, const T ymax, const unsigned int n ) { std::sort ( objects.begin(), objects.end(), sortObjects <T> ( xmin, ymin, xmax, ymax, n ) ); }

Read the article

Checking if an int is prime more efficiently

- by SipSop

I recently was part of a small java programming competition at my school. My partner and I have just finished our first pure oop class and most of the questions were out of our league so we settled on this one (and I am paraphrasing somewhat): "given an input integer n return the next int that is prime and its reverse is also prime for example if n = 18 your program should print 31" because 31 and 13 are both prime. Your .class file would then have a test case of all the possible numbers from 1-2,000,000,000 passed to it and it had to return the correct answer within 10 seconds to be considered valid. We found a solution but with larger test cases it would take longer than 10 seconds. I am fairly certain there is a way to move the range of looping from n,..2,000,000,000 down as the likely hood of needing to loop that far when n is a low number is small, but either way we broke the loop when a number is prime under both conditions is found. At first we were looping from 2,..n no matter how large it was then i remembered the rule about only looping to the square root of n. Any suggestions on how to make my program more efficient? I have had no classes dealing with complexity analysis of algorithms. Here is our attempt. public class P3 { public static void main(String[] args){ long loop = 2000000000; long n = Integer.parseInt(args[0]); for(long i = n; i<loop; i++) { String s = i +""; String r = ""; for(int j = s.length()-1; j>=0; j--) r = r + s.charAt(j); if(prime(i) && prime(Long.parseLong(r))) { System.out.println(i); break; } } System.out.println("#"); } public static boolean prime(long p){ for(int i = 2; i<(int)Math.sqrt(p); i++) { if(p%i==0) return false; } return true; } } ps sorry if i did the formatting for code wrong this is my first time posting here. Also the output had to have a '#' after each line thats what the line after the loop is about Thanks for any help you guys offer!!!

Read the article

Bracketing algorithm when root finding. Single root in "quadratic" function

- by Ander Biguri

I am trying to implement a root finding algorithm. I am using the hybrid Newton-Raphson algorithm found in numerical recipes that works pretty nicely. But I have a problem in bracketing the root. While implementing the root finding algorithm I realised that in several cases my functions have 1 real root and all the other imaginary (several of them, usually 6 or 9). The only root I am interested is in the real one so the problem is not there. The thing is that the function approaches the root like a cubic function, touching with the point the y=0 axis... Newton-Rapson method needs some brackets of different sign and all the bracketing methods I found don't work for this specific case. What can I do? It is pretty important to find that root in my program... EDIT: more problems: sometimes due to reaaaaaally small numerical errors, say a variation of 1e-6 in some value the "cubic" function does NOT have that real root, it is just imaginary with a neglectable imaginary part... (checked with matlab) EDIT 2: Much more information about the problem. Ok, I need root finding algorithm. Info I have: The root I need to find is between [0-1] , if there are more roots outside that part I am not interested in them. The root is real, there may be imaginary roots, but I don't want them. Probably all the rest of the roots will be imaginary The root may be double in that point, but I think that actually doesn't mater in numerical analysis problems I need to use the root finding algorithm several times during the overall calculations, but the function will always be a polynomial In one of the particular cases of the root finding, my polynomial will be similar to a quadratic function that touches Y=0 with the point. Example of a real case: The coefficient may not be 100% precise and that really slight imprecision may make the function not to touch the Y=0 axis. I cannot solve for this specific case because in other cases it may be that the polynomial is pretty normal and doesn't make any "strange" thing. The method I am actually using is NewtonRaphson hybrid, where if the derivative is really small it makes a bisection instead of NewRaph (found in numerical recipes). Matlab's answer to the function on the image: roots: 0.853553390593276 + 0.353553390593278i 0.853553390593276 - 0.353553390593278i 0.146446609406726 + 0.353553390593273i 0.146446609406726 - 0.353553390593273i 0.499999999999996 + 0.000000040142134i 0.499999999999996 - 0.000000040142134i The function is a real example I prepared where I know that the answer I want is 0.5 Note: I still haven't check completely some of the answers I you people have give me (Thank you!), I am just trying to give al the information I already have to complete the question.

Read the article

How can I structure and recode messy categorical data in R?

- by briandk

I'm struggling with how to best structure categorical data that's messy, and comes from a dataset I'll need to clean. The Coding Scheme I'm analyzing data from a university science course exam. We're looking at patterns in student responses, and we developed a coding scheme to represent the kinds of things students are doing in their answers. A subset of the coding scheme is shown below. Note that within each major code (1, 2, 3) are nested non-unique sub-codes (a, b, ...). What the Raw Data Looks Like I've created an anonymized, raw subset of my actual data which you can view here. Part of my problem is that those who coded the data noticed that some students displayed multiple patterns. The coders' solution was to create enough columns (reason1, reason2, ...) to hold students with multiple patterns. That becomes important because the order (reason1, reason2) is arbitrary--two students (like student 41 and student 42 in my dataset) who correctly applied "dependency" should both register in an analysis, regardless of whether 3a appears in the reason column or the reason2 column. How Can I Best Structure Student Data? Part of my problem is that in the raw data, not all students display the same patterns, or the same number of them, in the same order. Some students may do just one thing, others may do several. So, an abstracted representation of example students might look like this: Note in the example above that student002 and student003 both are coded as "1b", although I've deliberately shown the order as different to reflect the reality of my data. My (Practical) Questions Should I concatenate reason1, reason2, ... into one column? How can I (re)code the reasons in R to reflect the multiplicity for some students? Thanks I realize this question is as much about good data conceptualization as it is about specific features of R, but I thought it would be appropriate to ask it here. If you feel it's inappropriate for me to ask the question, please let me know in the comments, and stackoverflow will automatically flood my inbox with sadface emoticons. If I haven't been specific enough, please let me know and I'll do my best to be clearer.

Read the article

PDF Report generation

- by IniTech

EDIT : I completed this project using ABCpdf. For anyone interested, I love this product and their support is A+. Everything I listed as a 'Con' for the HTML - PDF solution was easily doable in ABCpdf. I've been charged with creating a data driven pdf report. After reviewing the plethora of options, I have narrowed it down to 2. I need you all to to help me decide, or offer alternatives I haven't considered. Here are the requirements: 100% Data driven Eventually PDF (a stop in HTML is fine, so long as it is converted) Can be run with multiple sets of data (the layout is always the same, the data is variable) Contains normal analysis-style copy (saved in DB with html markup) Contains tables (data for tables is generated at run-time) Header/Page # on each page Table of Contents .NET (VB or C#) Done quickly Now, because of the fact that the report is going to be generated with multiple sets of data, I don't think a stamped pdf template will work since I won't know how long or how many pages a certain piece of the report could require. So, I think my best options are: Programmatic creation using an iText-like solution. Generate in HTML and convert to PDF using a third-party application (ABCPdf is the tool I have played with so far) Both solutions have their pro's and con's. Programmatic solution: Pros: Flexible Easy page numbering/page header/table of contents Free Cons: Time consuming (to write a layer on top of iText to do what I need and keep maintainable) Since the copy is already stored in the db with html markup, I would have to parse through the data before I place it into the pdf, ensuring I don't have to break the paragraph into chunks so I can apply bold, italic, underline, etc. to specific phrases. This seems like a huge PITA, and I hope I am wrong about that assumption. HTML - PDF Pros: Easy to generate from db (no parsing necessary) Many tools for conversion Uses technology I am already familiar with Built-in "Print Preview" - not a req, but nice Cons: (Edited after project completion. All of my assumptions were incorrect and ABCpdf is awesome) 1. Almost impossible to generate page headers - Not True 2. Very difficult to generate page numbers Not True 3. Nearly impossible to generate table of contents Not True 4. (Cross-browser support isn't a con; Since its internal, I can dictate what browser to use) 5. Conversion tool quirks - may not convert exactly as rendered in browser Not True 6. Overall, I think it would be very hard to format the HTML exactly as I would want it to appear/convert to PDF. Not True That's it - I need the communitys help in deciding which way I should go. I might be wrong about some of my Pro/Con assumptions. If I am, please tell me. All thoughts and suggestions are welcome and appreciated. Thanks

Read the article

Is "Systems Designer" the job title that best describes what I do? [closed]

- by ivo-rossi

After having worked as Java developer for almost 3 years in the same company that I currently work at, I moved to a new position associated with the development of the same application. I’m in this new position for more than 1 year now. My official job title is Systems Designer, but I’m not sure this is a title that expresses well what I do. So my question here is what would be the most appropriate job title for me? I see this question as important for my career development. After all, I should be able to explain in one word what I do. And it’s no longer “Java Developer”. Well, in more than one word, this is what I do: The business analysts gather requirements / business problems to be solved with the clients and then discuss these requirements with me. Given the requirements, I design the high level solutions to be implemented in our system (e.g. a new screen on the client application, modifications to existing reports, extension to the XML export format of some objects, etc). I base my decision on the current capabilities of the system, the overall impact that the solutions would have on the system and the estimated effort to implement them (as I was a developer of this same application for almost 3 years before I moved to this position, I’m confident in my estimates). The solutions are discussed iteratively with the business analysts until we agree that they are good. The outcome of this analysis is what we call the “requirements design” document, which is written by me, shared with clients for approval and then also with the team that is going to implement the solutions and test them. Note that there are a few problems that I need to find a solution for that are non-functional. If the users are unhappy with the performance of a certain tool, I will investigate what can be done to speed it up. I will do some research – often based in the Java code itself - to identify possibilities of optimizations. But in this new position I no longer code, the main outcome of my work is really the “requirements design”. Is “Systems Designer” really the most appropriate job title?

Read the article

Memory Leak with Swing Drag and Drop

- by tom

I have a JFrame that accepts top-level drops of files. However after a drop has occurred, references to the frame are held indefinitely inside some Swing internal classes. I believe that disposing of the frame should release all of its resources, so what am I doing wrong? Example import java.awt.datatransfer.DataFlavor; import java.io.File; import java.util.List; import javax.swing.JFrame; import javax.swing.JLabel; import javax.swing.TransferHandler; public class DnDLeakTester extends JFrame { public static void main(String[] args) { new DnDLeakTester(); //Prevent main from returning or the jvm will exit while (true) { try { Thread.sleep(10000); } catch (InterruptedException e) { } } } public DnDLeakTester() { super("I'm leaky"); add(new JLabel("Drop stuff here")); setTransferHandler(new TransferHandler() { @Override public boolean canImport(final TransferSupport support) { return (support.isDrop() && support .isDataFlavorSupported(DataFlavor.javaFileListFlavor)); } @Override public boolean importData(final TransferSupport support) { if (!canImport(support)) { return false; } try { final List<File> files = (List<File>) support.getTransferable().getTransferData(DataFlavor.javaFileListFlavor); for (final File f : files) { System.out.println(f.getName()); } } catch (Exception e) { e.printStackTrace(); } return true; } }); setDefaultCloseOperation(DISPOSE_ON_CLOSE); pack(); setVisible(true); } } To reproduce, run the code and drop some files on the frame. Close the frame so it's disposed of. To verify the leak I take a heap dump using JConsole and analyse it with the Eclipse Memory Analysis tool. It shows that sun.awt.AppContext is holding a reference to the frame through its hashmap. It looks like TransferSupport is at fault. What am I doing wrong? Should I be asking the DnD support code to clean itself up somehow? I'm running JDK 1.6 update 19.

Read the article

Persistent (purely functional) Red-Black trees on disk performance

- by Waneck

I'm studying the best data structures to implement a simple open-source object temporal database, and currently I'm very fond of using Persistent Red-Black trees to do it. My main reasons for using persistent data structures is first of all to minimize the use of locks, so the database can be as parallel as possible. Also it will be easier to implement ACID transactions and even being able to abstract the database to work in parallel on a cluster of some kind. The great thing of this approach is that it makes possible implementing temporal databases almost for free. And this is something quite nice to have, specially for web and for data analysis (e.g. trends). All of this is very cool, but I'm a little suspicious about the overall performance of using a persistent data structure on disk. Even though there are some very fast disks available today, and all writes can be done asynchronously, so a response is always immediate, I don't want to build all application under a false premise, only to realize it isn't really a good way to do it. Here's my line of thought: - Since all writes are done asynchronously, and using a persistent data structure will enable not to invalidate the previous - and currently valid - structure, the write time isn't really a bottleneck. - There are some literature on structures like this that are exactly for disk usage. But it seems to me that these techniques will add more read overhead to achieve faster writes. But I think that exactly the opposite is preferable. Also many of these techniques really do end up with a multi-versioned trees, but they aren't strictly immutable, which is something very crucial to justify the persistent overhead. - I know there still will have to be some kind of locking when appending values to the database, and I also know there should be a good garbage collecting logic if not all versions are to be maintained (otherwise the file size will surely rise dramatically). Also a delta compression system could be thought about. - Of all search trees structures, I really think Red-Blacks are the most close to what I need, since they offer the least number of rotations. But there are some possible pitfalls along the way: - Asynchronous writes -could- affect applications that need the data in real time. But I don't think that is the case with web applications, most of the time. Also when real-time data is needed, another solutions could be devised, like a check-in/check-out system of specific data that will need to be worked on a more real-time manner. - Also they could lead to some commit conflicts, though I fail to think of a good example of when it could happen. Also commit conflicts can occur in normal RDBMS, if two threads are working with the same data, right? - The overhead of having an immutable interface like this will grow exponentially and everything is doomed to fail soon, so this all is a bad idea. Any thoughts? Thanks! edit: There seems to be a misunderstanding of what a persistent data structure is: http://en.wikipedia.org/wiki/Persistent_data_structure

Read the article

Design patterns and interview question

- by user160758

When I was learning to code, I read up on the design patterns like a good boy. Long after this, I started to actually understand them. Design discussions such as those on this site constantly try to make the rules more and more general, which is good. But there is a line, over which it becomes over-analysis starts to feed off itself and as such I think begins to obfuscate the original point - for example the "What's Alternative to Singleton" post and the links contained therein. http://stackoverflow.com/questions/1300655/whats-alternative-to-singleton I say this having been asked in both interviews I’ve had over the last 2 weeks what a singleton is and what criticisms I have of it. I have used it a few times for items such as user data (simple key-value eg. last file opened by this user) and logging (very common i'm sure). I've never ever used it just to have what is essentially global application data, as this is clearly stupid. In the first interview, I reply that I have no criticisms of it. He seemed disappointed by this but as the job wasn’t really for me, I forgot about it. In the next one, I was asked again and, as I wanted this job, I thought about it on the spot and made some objections, similar to those contained in the post linked to above (I suggested use of a factory or dependency injection instead). He seemed happy with this. But my problem is that I have used the singleton without ever using it in this kind of stupid way, which I had to describe on the spot. Using it for global data and the like isn’t something I did then realised was stupid, or read was stupid so didn’t do, it was just something I knew was stupid from the start. Essentially I’m supposed to be able to think of ways of how to misuse a pattern in the interview? Which class of programmers can best answer this question? The best ones? The medium ones? I'm not sure.... And these were both bright guys. I read more than enough to get better at my job but had never actually bothered to seek out criticisms of the most simple of the design patterns like this one. Do people think such questions are valid and that I ought to know the objections off by heart? Or that it is reasonable to be able to work out what other people who are missing the point would do on the fly? Or do you think I’m at least partially right that the question is too unsubtle and that the questions ought to be better thought out in order to make sure only good candidates can answer. PS. Please don’t think I’m saying that I’m just so clever that I know everything automatically - I’ve learnt the hard way like everyone else. But avoiding global data is hardly revolutionary.

Read the article

gcc optimization? bug? and its practial implication to project

- by kumar_m_kiran

Hi All, My questions are divided into three parts Question 1 Consider the below code, #include <iostream> using namespace std; int main( int argc, char *argv[]) { const int v = 50; int i = 0X7FFFFFFF; cout<<(i + v)<<endl; if ( i + v < i ) { cout<<"Number is negative"<<endl; } else { cout<<"Number is positive"<<endl; } return 0; } No specific compiler optimisation options are used or the O's flag is used. It is basic compilation command g++ -o test main.cpp is used to form the executable. The seemingly very simple code, has odd behaviour in SUSE 64 bit OS, gcc version 4.1.2. The expected output is "Number is negative", instead only in SUSE 64 bit OS, the output would be "Number is positive". After some amount of analysis and doing a 'disass' of the code, I find that the compiler optimises in the below format - Since i is same on both sides of comparison, it cannot be changed in the same expression, remove 'i' from the equation. Now, the comparison leads to if ( v < 0 ), where v is a constant positive, So during compilation itself, the else part cout function address is added to the register. No cmp/jmp instructions can be found. I see that the behaviour is only in gcc 4.1.2 SUSE 10. When tried in AIX 5.1/5.3 and HP IA64, the result is as expected. Is the above optimisation valid? Or, is using the overflow mechanism for int not a valid use case? Question 2 Now when I change the conditional statement from if (i + v < i) to if ( (i + v) < i ) even then, the behaviour is same, this atleast I would personally disagree, since additional braces are provided, I expect the compiler to create a temporary built-in type variable and them compare, thus nullify the optimisation. Question 3 Suppose I have a huge code base, an I migrate my compiler version, such bug/optimisation can cause havoc in my system behaviour. Ofcourse from business perspective, it is very ineffective to test all lines of code again just because of compiler upgradation. I think for all practical purpose, these kinds of error are very difficult to catch (during upgradation) and invariably will be leaked to production site. Can anyone suggest any possible way to ensure to ensure that these kind of bug/optimization does not have any impact on my existing system/code base? PS : When the const for v is removed from the code, then optimization is not done by the compiler. I believe, it is perfectly fine to use overflow mechanism to find if the variable is from MAX - 50 value (in my case).

Read the article

Are there compelling reasons not to use Groovy?

- by Leonard H Martin

I'm developing a LoB application in Java after a long absence from the platform (having spent the last 8 years or so entrenched in Fortran, C, a smidgin of C++ and latterly .Net). Java, the language, is not much changed from how I remember it. I like it's strengths and I can work around its weaknesses - the platform has grown and deciding upon the myriad of different frameworks which appear to do much the same thing as one another is a different story; but that can wait for another day - all-in-all I'm comfortable with Java. However, over the last couple of weeks I've become enamoured with Groovy, and purely from a selfish point of view: but not just because it makes development against the JVM a more succinct and entertaining (and, well, "groovy") proposition than Java (the language). What strikes me most about Groovy is its inherent maintainability. We all (I hope!) strive to write well documented, easy to understand code. However, sometimes the languages we use themselves defeat us. An example: in 2001 I wrote a library in C to translate EDIFACT EDI messages into ANSI X12 messages. This is not a particularly complicated process, if slightly involved, and I thought at the time I had documented the code properly - and I probably had - but some six years later when I revisited the project (and after becoming acclimatised to C#) I found myself lost in so much C boilerplate (mallocs, pointers, etc. etc.) that it took three days of thoughtful analysis before I finally understood what I'd been doing six years previously. This evening I've written about 2000 lines of Java (it is the day of rest, after all!). I've documented as best as I know how, but, but, of those 2000 lines of Java a significant proportion is Java boiler plate. This is where I see Groovy and other dynamic languages winning through - maintainability and later comprehension. Groovy lets you concentrate on your intent without getting bogged down on the platform specific implementation; it's almost, but not quite, self documenting. I see this as being a huge boon to me when I revisit my current project (which I'll port to Groovy asap) in several years time and to my successors who will inherit it and carry on the good work. So, are there any reasons not to use Groovy?

Read the article

In R, how do you get the best fitting equation to a set of data?

- by Matherion

I'm not sure wether R can do this (I assume it can, but maybe that's just because I tend to assume that R can do anything :-)). What I need is to find the best fitting equation to describe a dataset. For example, if you have these points: df = data.frame(x = c(1, 5, 10, 25, 50, 100), y = c(100, 75, 50, 40, 30, 25)) How do you get the best fitting equation? I know that you can get the best fitting curve with: plot(loess(df$y ~ df$x)) But as I understood you can't extract the equation, see Loess Fit and Resulting Equation. When I try to build it myself (note, I'm not a mathematician, so this is probably not the ideal approach :-)), I end up with smth like: y.predicted = 12.71 + ( 95 / (( (1 + df$x) ^ .5 ) / 1.3)) Which kind of seems to approximate it - but I can't help to think that smth more elegant probably exists :-) I have the feeling that fitting a linear or polynomial model also wouldn't work, because the formula seems different from what those models generally use (i.e. this one seems to need divisions, powers, etc). For example, the approach in Fitting polynomial model to data in R gives pretty bad approximations. I remember from a long time ago that there exist languages (Matlab may be one of them?) that do this kind of stuff. Can R do this as well, or am I just at the wrong place? (Background info: basically, what we need to do is find an equation for determining numbers in the second column based on the numbers in the first column; but we decide the numbers ourselves. We have an idea of how we want the curve to look like, but we can adjust these numbers to an equation if we get a better fit. It's about the pricing for a product (a cheaper alternative to current expensive software for qualitative data analysis); the more 'project credits' you buy, the cheaper it should become. Rather than forcing people to buy a given number (i.e. 5 or 10 or 25), it would be nicer to have a formula so people can buy exactly what they need - but of course this requires a formula. We have an idea for some prices we think are ok, but now we need to translate this into an equation.

Search Results

Search found 2683 results on 108 pages for 'statistical analysis'.

Page 98/108 | < Previous Page | 94 95 96 97 98 99 100 101 102 103 104 105 | Next Page >

- by NEETHU

- by Wade Williams

- by Lasse V. Karlsen

- by Kevin Pullin

- by peterchen

- by Eleasar

- by rasx

- by blumidoo

- by Bittercoder

- by Andy Dent

- by pitchfork

- by Michelle

- by Pedro Morte Rolo

- by Ian

- by SipSop

- by Ander Biguri

- by briandk

- by IniTech

- by ivo-rossi

- by tom

- by Waneck

- by user160758

- by kumar_m_kiran

- by Leonard H Martin

- by Matherion

< Previous Page | 94 95 96 97 98 99 100 101 102 103 104 105 | Next Page >