Search Results

Search found 1692 results on 68 pages for 'trending and statistics'.

Page 57/68 | < Previous Page | 53 54 55 56 57 58 59 60 61 62 63 64  | Next Page >

  • Best XML format for log events in terms of tool support for data mining and visualization?

    - by Thorbjørn Ravn Andersen
    We want to be able to create log files from our Java application which is suited for later processing by tools to help investigate bugs and gather performance statistics. Currently we use the traditional "log stuff which may or may not be flattened into text form and appended to a log file", but this works the best for small amounts of information read by a human. After careful consideration the best bet has been to store the log events as XML snippets in text files (which is then treated like any other log file), and then download them to the machine with the appropriate tool for post processing. I'd like to use as widely supported an XML format as possible, and right now I am in the "research-then-make-decision" phase. I'd appreciate any help both in terms of XML format and tools and I'd be happy to write glue code to get what I need. What I've found so far: log4j XML format: Supported by chainsaw and Vigilog. Lilith XML format: Supported by Lilith Uninvestigated tools: Microsoft Log Parser: Apparently supports XML. OS X log viewer: plus there is a lot of tools on http://www.loganalysis.org/sections/parsing/generic-log-parsers/ Any suggestions?

    Read the article

  • Selecting data in clustered index order without ORDER BY

    - by kcrumley
    I know there is no guarantee without an ORDER BY clause, but are there any techniques to tune SQL Server tables so they're more likely to return rows in clustered index order, without having to specify ORDER BY every single time I want to run a super-quick ad hoc query? For example, would rebuilding my clustered index or updating statistics help? I'm aware that I can't count on a query like: select * from AuditLog where UserId = 992 to return records in the order of the clustered index, so I would never build code into an application based on this assumption. But for simple ad hoc queries, on almost all of my tables, the data consistently comes out in clustered index order, and I've gotten used to being able to expect the most recent results to be at the bottom. Out of all the many tables we use, I've only noticed two ever giving me results in an unpredicted order. This is really just an annoyance, but it would be nice to be able to minimize it. In case this is relevant because of page boundary issues or something like that, I should mention that one of the tables that has inconsistent ordering, the AuditLog table, is the longest table we have that has a clustered index on an identity column. Also, this database has recently been moved from SQL 2005 to SQL 2008, and we've seen no noticeable change in this behavior.

    Read the article

  • multi-core processing in R on windows XP - via doMC and foreach

    - by Jan
    Hi guys, I'm posting this question to ask for advice on how to optimize the use of multiple processors from R on a Windows XP machine. At the moment I'm creating 4 scripts (each script with e.g. for (i in 1:100) and (i in 101:200), etc) which I run in 4 different R sessions at the same time. This seems to use all the available cpu. I however would like to do this a bit more efficient. One solution could be to use the "doMC" and the "foreach" package but this is not possible in R on a Windows machine. e.g. library("foreach") library("strucchange") library("doMC") # would this be possible on a windows machine? registerDoMC(2) # for a computer with two cores (processors) ## Nile data with one breakpoint: the annual flows drop in 1898 ## because the first Ashwan dam was built data("Nile") plot(Nile) ## F statistics indicate one breakpoint fs.nile <- Fstats(Nile ~ 1) plot(fs.nile) breakpoints(fs.nile) # , hpc = "foreach" --> It would be great to test this. lines(breakpoints(fs.nile)) Any solutions or advice? Thanks, Jan

    Read the article

  • With a browser, how do I know which decimal separator does the client use?

    - by Quassnoi
    I'm developing a web application. I need to display some decimal data correctly so that it can be copied and pasted into a certain GUI application that is not under my control. The GUI application is locale sensitive and it accepts only the correct decimal separator which is set in the system. I can guess the decimal separator from Accept-Language and the guess will be correct in 95% cases, but sometimes it fails. Is there any way to do it on server side (preferably, so that I can collect statistics), or on client side? Update: The whole point of the task is doing it automatically. In fact, this webapp is a kind of online interface to a legacy GUI which helps to fill the forms correctly. The kind of users that use it mostly have no idea on what a decimal separator is. The Accept-Language solution is implemented and works, but I'd like to improve it. Update2: I need to retrive a very specific setting: decimal separator set in Control Panel / Regional and Language Options / Regional Options / Customize. I deal with four kinds of operating systems: Russian Windows with a comma as a DS (80%). English Windows with a period as a DS (15%). Russian Windows with a period as a DS to make poorly written English applications work (4%). English Windows with a comma as a DS to make poorly written Russian applications work (1%). All 100% of clients are in Russia and the legacy application deals with Russian goverment-issued forms, so asking for a country will yield 100% of Russian Federation, and GeoIP will yield 80% of Russian Federation and 20% of other, incorrect answers.

    Read the article

  • looking to streamline my RSS feed mashup

    - by Mark Cejas
    Hello crafty developers, I have aggregated RSS feeds from various sources with RSSowl, fetching directly from the social mention API. The RSS feeds are categorized into the following major categories: blogs, news, twitter, Q&A and social networking sites. Each major category is nested with a common group of RSS feeds that represent a particular client/brand ontology. Merging these feeds into the RSSowl reader application, allows me to conduct and save refined search queries (from the aggregated data) into a single file - that I can then tag and further segment for analysis. This scheme is utilized for my own research needs and has helped me considerably. However, I find this RSS mashup scheme kinda clumsy, it requires quite a bit of time to initially organize all of the feeds and I would like to be able to do further natural language processing to the data as well as eventually be able to rank the collected list of URL's into some order of media prominence - right I don't want to pay the ridiculous radian6 web analytics fees, when my intuition is telling me that with a bit of 'elbow grease' I can maybe leverage some available resources online to develop a functional low scale web mining application and get some good intelligence from it. I am now starting to learn a little about computer science - my background is in physical science/statistics so is my thinking in the right track? So, I guess I am imagining an application that allows me to query in a refined manner. A manner that allows me to search for keyword combinations, applying AND/OR operators, selectively focus my queries into particular sources - like a collection of blogs or twitter, or social networking communities, then save the results of my queries into a structured format that can then be manipulated and explored. Am I dreaming? I just had to get all of this out. any bit of advice and insight would be hugely appreciated. my best, Mark

    Read the article

  • How should I organize complex SQL views in Rails?

    - by Benjamin Oakes
    I manage a research database with Ruby on Rails. The data that is entered is primarily used by scientists who prefer to have all the relevant information for a study in one single massive table for use in their statistics software of choice. I'm currently presenting it as CSV, as it's very straightforward to do and compatible with the tools people want to use. I've written many views (the SQL kind, not the Rails HTML/ERB kind) to make the output they expect a reality. Some of these views are quite large and have a fair amount of complexity behind them. I wrote them in SQL because there are many calculations and comparisons that are more easily done with SQL. They're currently loaded into the database straight from a file named views.sql. To get the requested data, I do a select * from my_view;. The views.sql file is getting quite large. Part of the problem is that we're still figuring out what the data we collect means, so there's a lot of changes being made to the views all the time -- and a ton of them are being created. Many of them need to be repeatable. I've recently run into issues organizing and testing these views. Rails works great for user interface stuff and business logic, but I'm not aware of much existing structure for handling the reporting we require. Some options I've thought of: Should I move them into the most relevant models somehow? Several of the views interact with each other, which makes this situation more complex than just doing a single find_by_sql, so I don't know if they should only be part of the model. Perhaps they should be treated as a "view" in the MVC sense? (That is, they could be moved into app/views/ and live alongside the HTML, perhaps as files named something like my_view.csv.sql which return CSV.) How would you deal with a complex reporting problem like this?

    Read the article

  • SQL Query to return maximums over decades

    - by Abraham Lincoln
    My question is the following. I have a baseball database, and in that baseball database there is a master table which lists every player that has ever played. There is also a batting table, which tracks every players' batting statistics. I created a view to join those two together; hence the masterplusbatting table. CREATE TABLE `Master` ( `lahmanID` int(9) NOT NULL auto_increment, `playerID` varchar(10) NOT NULL default '', `nameFirst` varchar(50) default NULL, `nameLast` varchar(50) NOT NULL default '', PRIMARY KEY (`lahmanID`), KEY `playerID` (`playerID`), ) ENGINE=MyISAM AUTO_INCREMENT=18968 DEFAULT CHARSET=latin1; CREATE TABLE `Batting` ( `playerID` varchar(9) NOT NULL default '', `yearID` smallint(4) unsigned NOT NULL default '0', `teamID` char(3) NOT NULL default '', `lgID` char(2) NOT NULL default '', `HR` smallint(3) unsigned default NULL, PRIMARY KEY (`playerID`,`yearID`,`stint`), KEY `playerID` (`playerID`), KEY `team` (`teamID`,`yearID`,`lgID`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; Anyway, my first query involved finding the most home runs hit every year since baseball began, including ties. The query to do that is the following.... select f.yearID, f.nameFirst, f.nameLast, f.HR from ( select yearID, max(HR) as HOMERS from masterplusbatting group by yearID )as x inner join masterplusbatting as f on f.yearID = x.yearId and f.HR = x.HOMERS This worked great. However, I now want to find the highest HR hitter in each decade since baseball began. Here is what I tried. select f.yearID, truncate(f.yearid/10,0) as decade,f.nameFirst, f.nameLast, f.HR from ( select yearID, max(HR) as HOMERS from masterplusbatting group by yearID )as x inner join masterplusbatting as f on f.yearID = x.yearId and f.HR = x.HOMERS group by decade You can see that I truncated the yearID in order to get 187, 188, 189 etc instead of 1897, 1885,. I then grouped by the decade, thinking that it would give me the highest per decade, but it is not returning the correct values. For example, it's giving me Adrian Beltre with 48 HR's in 2004 but everyone knows that Barry Bonds hit 73 HR in 2001. Can anyone give me some pointers?

    Read the article

  • Synchronizing in SQL Replication works when manually syncing, but not automatically

    - by Dominic Zukiewicz
    I'm using SQL Server 2005 to create a replication copy of the main databases, so that the reports can point to the replication copy instead of locking out our main databases. I have set up the 3 databases as publications and then 3 subscribers moving the transactions over to the subscribers, instantaneously I hope! What seems to be happening is that when using the "Insert Tracer" function, replication take publisher to distributor < 2 seconds, but to replicate to the subscribers can take over 7 minutes (and these are local databases on a SAN). This could be for 2 reasons: The SQL statements used to query the database are obtaining locks which are stopping the transactions updating the subscribers. The subscribers are just too busy for the replication to apply the changes. What seems to trouble me more, is that although the Replication Monitor / Insert Tracer are showing these statistics, if you use the "View Subscription Details" and then click Start, it will sync within seconds. My goal would be to have the data syncing (ideally) continuously, or every minute, perhaps I should reduce the batch size of the transactions? What am I doing wrong? [Note that the -Continuous flag is set!]

    Read the article

  • Is it a good idea to use MySQL and Neo4j together?

    - by Sanoj
    I will make an application with a lot of similar items (millions), and I would like to store them in a MySQL database, because I would like to do a lot of statistics and search on specific values for specific columns. But at the same time, I will store relations between all the items, that are related in many connected binary-tree-like structures (transitive closure), and relation databases are not good at that kind of structures, so I would like to store all relations in Neo4j which have good performance for this kind of data. My plan is to have all data except the relations in the MySQL database and all relations with item_id stored in the Neo4j database. When I want to lookup a tree, I first search the Neo4j for all the item_id:s in the tree, then I search the MySQL-database for all the specified items in a query that would look like: SELECT * FROM items WHERE item_id = 45 OR item_id = 345435 OR item_id = 343 OR item_id = 78 OR item_id = 4522 OR item_id = 676 OR item_id = 443 OR item_id = 4255 OR item_id = 4345 Is this a good idea, or am I very wrong? I haven't used graph-databases before. Are there any better approaches to my problem? How would the MySQL-query perform in this case?

    Read the article

  • How do I display java.lang.* object allocations in Eclipse profiler?

    - by Martin Wickman
    I am profiling an application using the Eclipse profiler. I am particularly interested in number of allocated object instances of classes from java.lang (for instance java.lang.String or java.util.HashMap). I also want to know stuff like number of calls to String.equals() etc. I use the "Object Allocations" tab and I shows all classes in my application and a count. It also shows all int[], byte[], long[] etc, but there is no mention of any standard java classes. For instance, this silly code: public static void main(String[] args) { Object obj[] = new Object[1000]; for (int i = 0; i < 1000; i++) { obj[i] = new StringBuffer("foo" + i); } System.out.println (obj[30]); } Shows up in the Object Allocations tab as 7 byte[]s, 4 char[]s and 2 int[]s. It doesn't matter if I use 1000 or 1 iterations. It seems the profiler simply ignores everything that is in any of the java.* packages. The same applies to Execution Statistics as well. Any idea how to display instances of java.* in the Eclipse Profiler?

    Read the article

  • AppFabric caching's local cache isnt working for us... What are we doing wrong?

    - by Olly
    We are using appfabric as the 2ndlevel cache for an NHibernate asp.net application comprising a customer facing website and an admin website. They are both connected to the same cache so when admin updates something, the customer facing site is updated. It seems to be working OK - we have a CacheCLuster on a seperate server and all is well but we want to enable localcache to get better performance, however, it dosnt seem to be working. We have enabled it like this... bool UseLocalCache = int LocalCacheObjectCount = int.MaxValue; TimeSpan LocalCacheDefaultTimeout = TimeSpan.FromMinutes(3); DataCacheLocalCacheInvalidationPolicy LocalCacheInvalidationPolicy = DataCacheLocalCacheInvalidationPolicy.TimeoutBased; if (UseLocalCache) { configuration.LocalCacheProperties = new DataCacheLocalCacheProperties( LocalCacheObjectCount, LocalCacheDefaultTimeout, LocalCacheInvalidationPolicy ); // configuration.NotificationProperties = new DataCacheNotificationProperties(500, TimeSpan.FromSeconds(300)); } Initially we tried using a timeout invalidation policy (3mins) and our app felt like it was running faster. HOWEVER, we noticed that if we changed something in the admin site, it was immediatley updated in the live site. As we are using timeouts not notifications, this demonstrates that the local cache isnt being queried (or is, but is always missing). The cache.GetType().Name returns "LocalCache" - so the factory has made a local cache. Running "Get-Cache-Statistics MyCache" in PS on my dev environment (asp.net app running local from vs2008, cache cluster running on a seperate w2k8 machine) show a handful of Request Counts. However, on the Production environment, the Request Count increases dramaticaly. We tried following the method here to se the cache cliebt-server traffic... http://blogs.msdn.com/b/appfabriccat/archive/2010/09/20/appfabric-cache-peeking-into-client-amp-server-wcf-communication.aspx but the log file had nothing but the initial header in it - i.e no loggin either. I cant find anything in SO or Google. Have we done something wrong? Have we got a screwy install of AppFabric - we installed it via WebPlatform Installer - I think? (note: the IIS box running ASp.net isnt in yhe cluster - it is just the client). Any insights greatfully received!

    Read the article

  • FileConnection Blackberry memory usage

    - by Dean
    Hello, I'm writing a blackberry application that reads ints and strings out of a database. This is my first time dealing with reading/writing on the blackberry, so forgive me if this is a dumb question. The database file I'm reading is only about 4kB I open the file with the following code fconn = (FileConnection) Connector.open("file_path_here", Connector.READ); if(fconn.exists()==false){fconn.close();return;} is = fconn.openDataInputStream(); while(!eof){ //etc... } is.close(); fconn.close(); The problem is, this code appears to be eating a LOT of memory. Using breakpoints and the "Memory Statistics" view, I determined the following: calling "Connector.open" creates 71 objects and changes "RAM Bytes in use" by 5376 calling "fconn.openDataInputStream();" increases RAM usage by a whopping 75920 Is this normal? Or am I doing something wrong? And how can I fix this? 75MB of RAM is a LOT of memory to waste on a handheld device, especially when the file I'm reading is only 4kB and I haven't even begun reading any data! How is this even possible?

    Read the article

  • can this problem be solved with a single SQL query?

    - by PierrOz
    I have the two following tables (with some sample datas) LOGS: ID | SETID | DATE ======================== 1 | 1 | 2010-02-25 2 | 2 | 2010-02-25 3 | 1 | 2010-02-26 4 | 2 | 2010-02-26 5 | 1 | 2010-02-27 6 | 2 | 2010-02-27 7 | 1 | 2010-02-28 8 | 2 | 2010-02-28 9 | 1 | 2010-03-01 STATS: ID | OBJECTID | FREQUENCY | STARTID | ENDID ============================================= 1 | 1 | 0.5 | 1 | 5 2 | 2 | 0.6 | 1 | 5 3 | 3 | 0.02 | 1 | 5 4 | 4 | 0.6 | 2 | 6 5 | 5 | 0.6 | 2 | 6 6 | 6 | 0.4 | 2 | 6 7 | 1 | 0.35 | 3 | 7 8 | 2 | 0.6 | 3 | 7 9 | 3 | 0.03 | 3 | 7 10 | 4 | 0.6 | 4 | 8 11 | 5 | 0.6 | 4 | 8 7 | 1 | 0.45 | 5 | 9 8 | 2 | 0.6 | 5 | 9 9 | 3 | 0.02 | 5 | 9 Every day new logs are analyzed on different sets of objects and stored in table LOGS. Among other processes, some statistics are computed on the objects contained into these sets and the result are stored in table STATS. These statistic are computed through several logs (identified by the STARTID and ENDID columns). So, what could be the SQL query that would give me the latest computed stats for all the objects with the corresponding log dates. In the given example, the result rows would be: OBJECTID | SETID | FREQUENCY | STARTDATE | ENDDATE ====================================================== 1 | 1 | 0.45 | 2010-02-27 | 2010-03-01 2 | 1 | 0.6 | 2010-02-27 | 2010-03-01 3 | 1 | 0.02 | 2010-02-27 | 2010-03-01 4 | 2 | 0.6 | 2010-02-26 | 2010-02-28 5 | 2 | 0.6 | 2010-02-26 | 2010-02-28 So, the most recent stats for set 1 are computed with logs from feb 27 to march 1 whereas stats for set 2 are computed from feb 26 to feb 28. object 6 is not in the results rows as there is no stat on it within the last period of time. Last thing, I use MySQL. Any Idea ?

    Read the article

  • Cannot connect to Github?

    - by user2973438
    so I tried to push some updates onto my repo on github via terminal on Mac OSX 10.8.4 and it doesn't work. I've been getting the same error many times: Lillys-MacBook-Air:Yuewei Lilly$ git push origin master error: Failed connect to github.com:443; Operation timed out while accessing https://github.com/lillybeans/Yuewei.git/info/refs?service=git-receive-pack fatal: HTTP request failed Some background: I've pushed many projects onto github before using terminal (when I was in Canada). I am currently in Shanghai, China, could it be the GFW? But when I was in Beijing, I was able to push projects onto github still. when I do ping github.com: Lillys-MacBook-Air:Yuewei Lilly$ ping github.com PING github.com (192.30.252.131): 56 data bytes Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 ping: sendto: No route to host Request timeout for icmp_seq 2 ping: sendto: Host is down Request timeout for icmp_seq 3 ping: sendto: Host is down Request timeout for icmp_seq 4 ping: sendto: Host is down Request timeout for icmp_seq 5 ping: sendto: Host is down Request timeout for icmp_seq 6 ping: sendto: Host is down Request timeout for icmp_seq 7 ^C --- github.com ping statistics --- 9 packets transmitted, 0 packets received, 100.0% packet loss Lillys-MacBook-Air:Yuewei Lilly$ I have ShadowSocks (proxy) turned on. Without it I can't access github.com via browser, with it, I can. also when I do "git remote -v" I see both my pull and push remote repos correctly listed. Thank you in advance!

    Read the article

  • What about parallelism across network using multiple PCs?

    - by MainMa
    Parallel computing is used more and more, and new framework features and shortcuts make it easier to use (for example Parallel extensions which are directly available in .NET 4). Now what about the parallelism across network? I mean, an abstraction of everything related to communications, creation of processes on remote machines, etc. Something like, in C#: NetworkParallel.ForEach(myEnumerable, () => { // Computing and/or access to web ressource or local network database here }); I understand that it is very different from the multi-core parallelism. The two most obvious differences would probably be: The fact that such parallel task will be limited to computing, without being able for example to use files stored locally (but why not a database?), or even to use local variables, because it would be rather two distinct applications than two threads of the same application, The very specific implementation, requiring not just a separate thread (which is quite easy), but spanning a process on different machines, then communicating with them over local network. Despite those differences, such parallelism is quite possible, even without speaking about distributed architecture. Do you think it will be implemented in a few years? Do you agree that it enables developers to easily develop extremely powerfull stuff with much less pain? Example: Think about a business application which extracts data from the database, transforms it, and displays statistics. Let's say this application takes ten seconds to load data, twenty seconds to transform data and ten seconds to build charts on a single machine in a company, using all the CPU, whereas ten other machines are used at 5% of CPU most of the time. In a such case, every action may be done in parallel, resulting in probably six to ten seconds for overall process instead of forty.

    Read the article

  • Sub Query making Query slow.

    - by Muhammad Kashif Nadeem
    Please copy and paste following script. DECLARE @MainTable TABLE(MainTablePkId int) INSERT INTO @MainTable SELECT 1 INSERT INTO @MainTable SELECT 2 DECLARE @SomeTable TABLE(SomeIdPk int, MainTablePkId int, ViewedTime1 datetime) INSERT INTO @SomeTable SELECT 1, 1, DATEADD(dd, -10, getdate()) INSERT INTO @SomeTable SELECT 2, 1, DATEADD(dd, -9, getdate()) INSERT INTO @SomeTable SELECT 3, 2, DATEADD(dd, -6, getdate()) DECLARE @SomeTableDetail TABLE(DetailIdPk int, SomeIdPk int, Viewed INT, ViewedTimeDetail datetime) INSERT INTO @SomeTableDetail SELECT 1, 1, 1, DATEADD(dd, -7, getdate()) INSERT INTO @SomeTableDetail SELECT 2, 2, NULL, DATEADD(dd, -6, getdate()) INSERT INTO @SomeTableDetail SELECT 3, 2, 2, DATEADD(dd, -8, getdate()) INSERT INTO @SomeTableDetail SELECT 4, 3, 1, DATEADD(dd, -6, getdate()) SELECT m.MainTablePkId, (SELECT COUNT(Viewed) FROM @SomeTableDetail), (SELECT TOP 1 s2.ViewedTimeDetail FROM @SomeTableDetail s2 INNER JOIN @SomeTable s1 ON s2.SomeIdPk = s1.SomeIdPk WHERE s1.MainTablePkId = m.MainTablePkId) FROM @MainTable m Above given script is just sample. I have long list of columns in SELECT and around 12+ columns in Sub Query. In my From clause there are around 8 tables. To fetch 2000 records full query take 21 seconds and if I remove Subquiries it just take 4 seconds. I have tried to optimize query using 'Database Engine Tuning Advisor' and on adding new advised indexes and statistics but these changes make query time even bad. Note: As I have mentioned that this is test data to explain my question the real data has lot of tables joins columns but without Sub-Query the results us fine. Any help thanks.

    Read the article

  • Periodic GPU performance problem

    - by Peter Lillevold
    Hi folks! I have a WinForms application that uses XNA to animate 3D models in a control. The app have been doing just fine for months but recently I've started to experience periodic pauses in the animation. Setting out to investigate what is going on I have established these facts: It (currently) happens on my machine only Removing everything from my render loop does not improve the problem In 2. I didn't actually remove everything, I limited my loop to set the viewport on my GraphicsDevice and then do a GraphicsDevice.Present. Trying to dig further I fired up PIX to capture some statistics. Screenshots of two PIX runs can be viewed here (Run6) and here (Run14). Run6 is using my original render loop and Run14 is using the bare-bones Present loop. PIX tells me that the GPU is periodically doing something, and I assume this is causing the pauses. What could be the cause of this? Or how do I go about finding out what the GPU is actually doing? Note: I'm using XNA 3.1 on a Windows 7 x64 dual-core machine with 8GB RAM. Note2: also posted this question on the XNA Creators forums here.

    Read the article

  • Best way to build an application based on R?

    - by Prasad Chalasani
    I'm looking for suggestions on how to go about building an application that uses R for analytics, table generation, and plotting. What I have in mind is an application that: displays various data tables in different tabs, somewhat like in Excel, and the columns should be sortable by clicking. takes user input parameters in some dialog windows. displays plots dynamically (i.e. user-input-dependent) either in a tab or in a new pop-up window/frame Note that I am not talking about a general-purpose fron-end/GUI for exploring data with R (like say Rattle), but a specific application. Some questions I'd like to see addressed are: Is an entirely R-based approach even possible ( on Windows ) ? The following passage from the Rattle article in R-Journal intrigues me: It is interesting to note that the first implementation of Rattle actually used Python for implementing the callbacks and R for the statistics, using rpy. The release of RGtk2 allowed the interface el- ements of Rattle to be written directly in R so that Rattle is a fully R-based application If it's better to use another language for the GUI part, which language is best suited for this? I'm looking for a language where it's relatively "painless" to build the GUI, and that also integrates very well with R. From this StackOverflow question How should I do rapid GUI development for R and Octave methods (possibly with Python)? I see that Python + PyQt4 + QtDesigner + RPy2 seems to be the best combo. Is that the consensus ? Anyone have pointers to specific (open source) applications of the type I describe, as examples that I can learn from?

    Read the article

  • mySQL select and group by values

    - by Foo
    I'd like to count and group rows by specific values. This seems fairly simple, but I can't seem to do it. I have a table set up similar to this: Table: Ratings id pID uID rating 1 1 2 7 2 1 7 7 3 1 5 4 4 1 1 1 id is the primary key, piD and uID are foreign-keys. Rating contains values between 1 and 10, and only between 1 and 10. I want to run some statistics and count the number of ratings with a certain value. In the example above, two have left a rating of 7. So I wrote the following query: SELECT COUNT(*) AS 'count' , 'rating' FROM 'ratings' WHERE pID= '1' GROUP BY `rating` ORDER BY `rating` Which yields the nice result as: count ratings 1 1 1 4 2 7 I'd like to get the mySQL query to include values between 1 and 10 as well. For example: Desired Result count ratings 1 1 0 2 0 3 1 4 0 5 0 6 2 7 0 8 0 9 0 10 Unfortunately, I'm relatively new to SQL and I've been reading through everything I could get my hands on for the past hour, but I can't get it to work. I've been leaning along the lines of a some type of JOIN. If anyone can point me in the right direction, it'd be appreciated. Thanks.

    Read the article

  • MSSQL 2008 - Bit Param Evaluation alters Execution Plan

    - by Nathanial Woolls
    I have been working on migrating some of our data from Microsoft SQL Server 2000 to 2008. Among the usual hiccups and whatnot, I’ve run across something strange. Linked below is a SQL query that returns very quickly under 2000, but takes 20 minutes under 2008. I have read quite a bit on upgrading SQL server and went down the usual paths of checking indexes, statistics, etc. before coming to the conclusion that the following statement, found in the WHERE clause, causes the execution plan for the steps that follow this statement to change dramatically: And ( @bOnlyUnmatched = 0 -- offending line Or Not Exists( The SQL statements and execution plans are linked below. A coworker was able to rewrite a portion of the WHERE clause using a CASE statement, which seems to “trick” the optimizer into using a better execution plan. The version with the CASE statement is also contained in the linked archive. I’d like to see if someone has an explanation as to why this is happening and if there may be a more elegant solution than using a CASE statement. While we can work around this specific issue, I’d like to have a broader understanding of what is happening to ensure the rest of the migration is as painless as possible. Zip file with SQL statements and XML execution plans Thanks in advance!

    Read the article

  • Better code for accessing fields in a matlab structure array?

    - by John
    I have a matlab structure array Modles1 of size (1x180) that has fields a, b, c, ..., z. I want to understand how many distinct values there are in each of the fields. i.e. max(grp2idx([foo(:).a])) The above works if the field a is a double. {foo(:).a} needs to be used in the case where the field a is a string/char. Here's my current code for doing this. I hate having to use the eval, and what is essentially a switch statement. Is there a better way? names = fieldnames(Models1); for ix = 1 : numel(names) className = eval(['class(Models1(1).',names{ix},')']); if strcmp('double', className) || strcmp('logical',className) eval([' values = [Models1(:).',names{ix},'];']); elseif strcmp('char', className) eval([' values = {Models1(:).',names{ix},'};']); else disp(['Unrecognized class: ', className]); end % this line requires the statistics toolbox. [g, gn, gl] = grp2idx(values); fprintf('%30s : %4d\n',names{ix},max(g)); end

    Read the article

  • Logic: Best way to sample & count bytes of a 100MB+ file

    - by Jami
    Let's say I have this 170mb file (roughly 180 million bytes). What I need to do is to create a table that lists: all 4096 byte combinations found [column 'bytes'], and the number of times each byte combination appeared in it [column 'occurrences'] Assume two things: I can save data very fast, but I can update my saved data very slow. How should I sample the file and save the needed information? Here're some suggestions that are (extremely) slow: Go through each 4096 byte combinations in the file, save each data, but search the table first for existing combinations and update it's values. this is unbelievably slow Go through each 4096 byte combinations in the file, save until 1 million rows of data in a temporary table. Go through that table and fix the entries (combine repeating byte combinations), then copy to the big table. Repeat going through another 1 million rows of data and repeat the process. this is faster by a bit, but still unbelievably slow This is kind of like taking the statistics of the file. NOTE: I know that sampling the file can generate tons of data (around 22Gb from experience), and I know that any solution posted would take a bit of time to finish. I need the most efficient saving process

    Read the article

  • Need help optimizing this Django aggregate query

    - by Chris Lawlor
    I have the following model class Plugin(models.Model): name = models.CharField(max_length=50) # more fields which represents a plugin that can be downloaded from my site. To track downloads, I have class Download(models.Model): plugin = models.ForiegnKey(Plugin) timestamp = models.DateTimeField(auto_now=True) So to build a view showing plugins sorted by downloads, I have the following query: # pbd is plugins by download - commented here to prevent scrolling pbd = Plugin.objects.annotate(dl_total=Count('download')).order_by('-dl_total') Which works, but is very slow. With only 1,000 plugins, the avg. response is 3.6 - 3.9 seconds (devserver with local PostgreSQL db), where a similar view with a much simpler query (sorting by plugin release date) takes 160 ms or so. I'm looking for suggestions on how to optimize this query. I'd really prefer that the query return Plugin objects (as opposed to using values) since I'm sharing the same template for the other views (Plugins by rating, Plugins by release date, etc.), so the template is expecting Plugin objects - plus I'm not sure how I would get things like the absolute_url without a reference to the plugin object. Or, is my whole approach doomed to failure? Is there a better way to track downloads? I ultimately want to provide users some nice download statistics for the plugins they've uploaded - like downloads per day/week/month. Will I have to calculate and cache Downloads at some point? EDIT: In my test dataset, there are somewhere between 10-20 Download instances per Plugin - in production I expect this number would be much higher for many of the plugins.

    Read the article

  • Determining Best Table Structure for MySQL Performance

    - by Joe Majewski
    I'm working on a browser-based RPG for one of my websites, and right now I'm trying to determine the best way to organize my SQL tables for performance and maintenance. Here's my question: Does the number of columns in an SQL table affect the speed in which it can be queried? I am not a newbie when it comes to PHP or MySQL. I used to develop things with the common goal of getting them to work, but I've recently advanced to the stage where a functional program is not good enough unless it's fast and reliable. Anyways, right now I have a members table that has around 15 columns. It contains information such as the player's username, password, email, logins, page views, etcetera. It doesn't contain any information on the player's progress in the game, however. If I added columns for things such as army size, gold, turns, and whatnot, then it could easily rise to around 40 or 50 total columns. Oh, and my database structure IS normalized. Will a table with 50 columns that gets constantly queried be a bad idea? Should I split it into two tables; one for the user's general information and one for the user's game statistics? I know I could check the query time myself, but I haven't actually created the tables yet and I think I'd be better off with some professional advice on this important decision for my game. Thank you for your time! :)

    Read the article

  • How do I call Matlab in a script on Windows?

    - by Benjamin Oakes
    I'm working on a project that uses several languages: SQL for querying a database Perl/Ruby for quick-and-dirty processing of the data from the database and some other bookkeeping Matlab for matrix-oriented computations Various statistics languages (SAS/R/SPSS) for processing the Matlab output Each language fits its niche well and we already have a fair amount of code in each. Right now, there's a lot of manual work to run all these steps that would be much better scripted. I've already done this on Linux, and it works relatively well. On Linux: matlab -nosplash -nodesktop -r "command" or echo "command" | matlab -nosplash -nodesktop ...opens Matlab in a "command line" mode. (That is, no windows are created -- it just reads from STDIN, executes, and outputs to STDOUT/STDERR.) My problem is that on Windows (XP and 7), this same code opens up a window and doesn't read from / write to the command line. It just stares me blankly in the face, totally ignoring STDIN and STDOUT. How can I script running Matlab commands on Windows? I basically want something that will do: ruby database_query.rb perl legacy_code.pl ruby other_stuff.rb matlab processing_step_1.m matlab processing_step_2.m # etc, etc. I've found out that Matlab has an -automation flag on Windows to start an "automation server". That sounds like overkill for my purposes, and I'd like something that works on both platforms. What options do I have for automating Matlab in this workflow?

    Read the article

< Previous Page | 53 54 55 56 57 58 59 60 61 62 63 64  | Next Page >