Search Results

Search found 1631 results on 66 pages for 'statistics'.

Page 58/66 | < Previous Page | 54 55 56 57 58 59 60 61 62 63 64 65 | Next Page >

Stumbling Through: Visual Studio 2010 (Part II)

I would now like to expand a little on what I stumbled through in part I of my Visual Studio 2010 post and touch on a few other features of VS 2010. Specifically, I want to generate some code based off of an Entity Framework model and tie it up to an actual data source. Im not going to take the easy way and tie to a SQL Server data source, though, I will tie it to an XML data file instead. Why? Well, why not? This is purely for learning, there are probably much better ways to get strongly-typed classes around XML but it will force us to go down a path less travelled and maybe learn a few things along the way. Once we get this XML data and the means to interact with it, I will revisit data binding to this data in a WPF form and see if I cant get reading, adding, deleting, and updating working smoothly with minimal code. To begin, I will use what was learned in the first part of this blog topic and draw out a data model for the MFL (My Football League) - I dont want the NFL to come down and sue me for using their name in this totally football-related article. The data model looks as follows, with Teams having Players, and Players having a position and statistics for each season they played: Note that when making the associations between these entities, I was given the option to create the foreign key but I only chose to select this option for the association between Player and Position. The reason for this is that I am picturing the XML that will contain this data to look somewhat like this: <MFL> <Position/> <Position/> <Position/> <Team> <Player> <Statistic/> </Player> </Team> </MFL> Statistic will be under its associated Player node, and Player will be under its associated Team node no need to have an Id to reference it if we know it will always fall under its parent. Position, however, is more of a lookup value that will not have any hierarchical relationship to the player. In fact, the Position data itself may be in a completely different xml file (something Id like to play around with), so in any case, a player will need to reference the position by its Id. So now that we have a simple data model laid out, I would like to generate two things based on it: A class for each entity with properties corresponding to each entity property An IO class with methods to get data for each entity, either all instances, by Id or by parent. Now my experience with code generation in the past has consisted of writing up little apps that use the code dom directly to regenerate code on demand (or using tools like CodeSmith). Surely, there has got to be a more fun way to do this given that we are using the Entity Framework which already has built-in code generation for SQL Server support. Lets start with that built-in stuff to give us a base to work off of. Right click anywhere in the canvas of our model and select Add Code Generation Item: So just adding that code item seemed to do quite a bit towards what I was intending: It apparently generated a class for each entity, but also a whole ton more. I mean a TON more. Way too much complicated code was generated now that code is likely to be a black box anyway so it shouldnt matter, but we need to understand how to make this work the way we want it to work, so lets get ready to do some stumbling through that text template (tt) file. When I open the .tt file that was generated, right off the bat I realize there is going to be trouble there is no color coding, no intellisense no nothing! That is going to make stumbling through more like groping blindly in the dark while handcuffed and hopping on one foot, which was one of the alternate titles I was considering for this blog. Thankfully, the community comes to my rescue and I wont have to cast my mind back to the glory days of coding in VI (look it up, kids). Using the Extension Manager (Available under the Tools menu), I did a quick search for tt editor in the Online Gallery and quickly found the Tangible T4 Editor: Downloading and installing this was a breeze, and after doing so I got some color coding and intellisense while editing the tt files. If you will be doing any customizing of tt files, I highly recommend installing this extension. Next, well see if that is enough help for us to tweak that tt file to do the kind of code generation that we wantDid you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
Is Data Science “Science”?

- by BuckWoody

I hold the term “science” in very high esteem. I grew up on the Space Coast in Florida, and eventually worked at the Kennedy Space Center, surrounded by very intelligent people who worked in various scientific fields. Recently a new term has entered the computing dialog – “Data Scientist”. Since it’s not a standard term, it has a lot of definitions, and in fact has been disputed as a correct term. After all, the reasoning goes, if there’s no such thing as “Data Science” then how can there be a Data Scientist? This argument has been made before, albeit with a different term – “Computer Science”. In Peter Denning’s excellent article “Is Computer Science Science” (April 2005/Vol. 48, No. 4 COMMUNICATIONS OF THE ACM) there are many points that separate “science” from “engineering” and even “art”. I won’t repeat the content of that article here (I recommend you read it on your own) but will leverage the points he makes there. Definition of Science To ask the question “is data science ‘science’” then we need to start with a definition of terms. Various references put the definition into the same basic areas: Study of the physical world Systematic and/or disciplined study of a subject area ...and then they include the things studied, the bodies of knowledge and so on. The word itself comes from Latin, and means merely “to know” or “to study to know”. Greek divides knowledge further into “truth” (episteme), and practical use or effects (tekhne). Normally computing falls into the second realm. Definition of Data Science And now a more controversial definition: Data Science. This term is so new and perhaps so niche that the major dictionaries haven’t yet picked it up (my OED reference is older – can’t afford to pop for the online registration at present). Researching the term's general use I created an amalgam of the definitions this way: “Studying and applying mathematical and other techniques to derive information from complex data sets.” Using this definition, data science certainly seems to be science - it's learning about and studying some object or area using systematic methods. But implicit within the definition is the word “application”, which makes the process more akin to engineering or even technology than science. In fact, I find that using these techniques – and data itself – part of science, not science itself. I leave out the concept of studying data patterns or algorithms as part of this discipline. That is actually a domain I see within research, mathematics or computer science. That of course is a type of science, but does not seek for practical applications. As part of the argument against calling it “Data Science”, some point to the scientific method of creating a hypothesis, testing with controls, testing results against the hypothesis, and documenting for repeatability. These are not steps that we often take in working with data. We normally start with a question, and fit patterns and algorithms to predict outcomes and find correlations. In this way Data Science is more akin to statistics (and in fact makes heavy use of them) in the process rather than starting with an assumption and following on with it. So, is Data Science “Science”? I’m uncertain – and I’m uncertain it matters. Even if we are facing rampant “title inflation” these days (does anyone introduce themselves as a secretary or supervisor anymore?) I can tolerate the term at least from the intent that we use data to study problems across a wide spectrum, rather than restricting it to a single domain. And I also understand those who have worked hard to achieve the very honorable title of “scientist” who have issues with those who borrow the term without asking. What do you think? Science, or not? Does it matter?

Read the article
Profiling Startup Of VS2012 – YourKit Profiler

- by Alois Kraus

The YourKit (v7.0.5) profiler is interesting in terms of price (79€ single place license, 409€ + 1 year support and upgrades) and feature set. You do get a performance and memory profiler in one package for which you normally need also to pay extra from the other vendors. As an interesting side note the profiler UI is written in Java because they do also sell Java profilers with the same feature set. To get all methods of a VS startup you need first to configure it to include System* in the profiled methods and you need to configure * to measure wall clock time. By default it does record only CPU times which allows you to optimize CPU hungry operations. But you will never see a Thread.Sleep(10000) in the profiler blocking the UI in this mode. It can profile as all others processes started from within the profiler but it can also profile the next or all started processes. As usual it can profile in sampling and tracing mode. But since it is a memory profiler as well it does by default also record all object allocations > 1MB. With allocation recording enabled VS2012 did crash but without allocation recording there were no problems. The CPU tab contains the time line of the application and when you click in the graph you the call stacks of all threads at this time. This is really a nice feature. When you select a time region you the CPU Usage estimation for this time window. I have seen many applications consuming 100% CPU only because they did create garbage like crazy. For this is the Garbage Collection tab interesting in conjunction with a time range. This view is like the CPU table only that the CPU graph (green) is missing. All relevant information except for GCs/s is already visible in the CPU tab. Very handy to pinpoint excessive GC or CPU bound issues. The Threads tab does show the thread names and their lifetime. This is useful to see thread interactions or which thread is hottest in terms of CPU consumption. On the CPU tab the call tree does exist in a merged and thread specific view. When you click on a method you get below a list of all called methods. There you can sort for methods with a high own time which are worth optimizing. In the Method List you can select which scope you want to see. Back Traces are the methods which did call you. Callees ist the list of methods called directly or indirectly by your method as a flat list. This is not a call stack but still very useful to see which methods were slow so you can see the “root” cause quite quickly without the need to click trough long call stacks. The last view Merged Calles is a call stacked view of the previous view. This does help a lot to understand did call each method at run time. You would get the same view with a debugger for one call invocation but here you get the full statistics (invocation count) as well. Since YourKit is also a memory profiler you can directly see which objects you have on your managed heap and which objects do hold most of your precious memory. You can in in the Object Explorer view also examine the contents of your objects (strings or whatsoever) to get a better understanding which objects where potentially allocating this stuff. YourKit is a very easy to use combined memory and performance profiler in one product. The unbeatable single license price makes it very attractive to straightly buy it. Although it is a Java UI it is very responsive and the memory consumption is considerably lower compared to dotTrace and ANTS profiler. What I do really like is to start the YourKit ui and then start the processes I want to profile as usual. There is no need to alter your own application code to be able to inject a profiler into your new started processes. For performance and memory profiling you can simply select the process you want to investigate from the list of started processes. That's the way I like to use profilers. Just get out of the way and let the application run without any special preparations. Next: Telerik JustTrace

Read the article
Right-Time Retail Part 2

- by David Dorf

This is part two of the three-part series. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Right-Time Integration Of course these real-time enabling technologies are only as good as the systems that utilize them, and it only takes one bottleneck to slow everyone else down. What good is an immediate stock-out notification if the supply chain can’t react until tomorrow? Since being formed in 2006, Oracle Retail has been not only adding more integrations between systems, but also modernizing integrations for appropriate speed. Notice I tossed in the word “appropriate.” Not everything needs to be real-time – again, we’re talking about Right-Time Retail. The speed of data capture, analysis, and execution must be synchronized or you’re wasting effort. Unfortunately, there isn’t an enterprise-wide dial that you can crank-up for your estate. You’ll need to improve things piecemeal, with people and processes as limiting factors while choosing the appropriate types of integrations. There are three integration styles we see in the retail industry. First is batch. I know, the word “batch” just sounds slow, but this pattern is less about velocity and more about volume. When there are large amounts of data to be moved, you’ll want to use batch processes. Our technology of choice here is Oracle Data Integrator (ODI), which provides a fast version of Extract-Transform-Load (ETL). Instead of the three-step process, the load and transform steps are combined to save time. ODI is a key technology for moving data into Retail Analytics where we can apply science. Performing analytics on each sale as it occurs doesn’t make any sense, so we batch up a statistically significant amount and submit all at once. The second style is fire-and-forget. For some types of data, we want the data to arrive ASAP but immediacy is not necessary. Speed is less important than guaranteed delivery, so we use message-oriented middleware available in both Weblogic and the Oracle database. For example, Point-of-Service transactions are queued for delivery to Central Office at corporate. If the network is offline, those transactions remain in the queue and will be delivered when the network returns. Transactions cannot be lost and they must be delivered in order. (Ever tried processing a return before the sale?) To enhance the standard queues, we offer the Retail Integration Bus (RIB) to help the management and monitoring of fire-and-forget messaging in the enterprise. The third style is request-response and is most commonly implemented as Web services. This is a synchronous message where the sender waits for a response. In this situation, the volume of data is small, guaranteed delivery is not necessary, but speed is very important. Examples include the website checking inventory, a price lookup, or processing a credit card authorization. The Oracle Service Bus (OSB) typically handles the routing of such messages, and we’ve enhanced its abilities with the Retail Service Backbone (RSB). To better understand these integration patterns and where they apply within the retail enterprise, we’re providing the Retail Reference Library (RRL) at no charge to Oracle Retail customers. The library is composed of a large number of industry business processes, including those necessary to support Commerce Anywhere, as well as detailed architectural diagrams. These diagrams allow implementers to understand the systems involved in integrations and the specific data payloads. Furthermore, with our upcoming release we’ll be providing a new tool called the Retail Integration Console (RIC) that allows IT to monitor and manage integrations from a single point. Using RIC, retailers can quickly discern where integration activity is occurring, volume statistics, average response times, and errors. The dashboards provide the ability to dive down into the architecture documentation to gather information all the way down to the specific payload. Retailers that want real-time integrations will also need real-time monitoring of those integrations to ensure service-level agreements are maintained. Part 3 looks at marketing.

Read the article
Stumbling Through: Visual Studio 2010 (Part II)

I would now like to expand a little on what I stumbled through in part I of my Visual Studio 2010 post and touch on a few other features of VS 2010. Specifically, I want to generate some code based off of an Entity Framework model and tie it up to an actual data source. Im not going to take the easy way and tie to a SQL Server data source, though, I will tie it to an XML data file instead. Why? Well, why not? This is purely for learning, there are probably much better ways to get strongly-typed classes around XML but it will force us to go down a path less travelled and maybe learn a few things along the way. Once we get this XML data and the means to interact with it, I will revisit data binding to this data in a WPF form and see if I cant get reading, adding, deleting, and updating working smoothly with minimal code. To begin, I will use what was learned in the first part of this blog topic and draw out a data model for the MFL (My Football League) - I dont want the NFL to come down and sue me for using their name in this totally football-related article. The data model looks as follows, with Teams having Players, and Players having a position and statistics for each season they played: Note that when making the associations between these entities, I was given the option to create the foreign key but I only chose to select this option for the association between Player and Position. The reason for this is that I am picturing the XML that will contain this data to look somewhat like this: <MFL> <Position/> <Position/> <Position/> <Team> <Player> <Statistic/> </Player> </Team> </MFL> Statistic will be under its associated Player node, and Player will be under its associated Team node no need to have an Id to reference it if we know it will always fall under its parent. Position, however, is more of a lookup value that will not have any hierarchical relationship to the player. In fact, the Position data itself may be in a completely different xml file (something Id like to play around with), so in any case, a player will need to reference the position by its Id. So now that we have a simple data model laid out, I would like to generate two things based on it: A class for each entity with properties corresponding to each entity property An IO class with methods to get data for each entity, either all instances, by Id or by parent. Now my experience with code generation in the past has consisted of writing up little apps that use the code dom directly to regenerate code on demand (or using tools like CodeSmith). Surely, there has got to be a more fun way to do this given that we are using the Entity Framework which already has built-in code generation for SQL Server support. Lets start with that built-in stuff to give us a base to work off of. Right click anywhere in the canvas of our model and select Add Code Generation Item: So just adding that code item seemed to do quite a bit towards what I was intending: It apparently generated a class for each entity, but also a whole ton more. I mean a TON more. Way too much complicated code was generated now that code is likely to be a black box anyway so it shouldnt matter, but we need to understand how to make this work the way we want it to work, so lets get ready to do some stumbling through that text template (tt) file. When I open the .tt file that was generated, right off the bat I realize there is going to be trouble there is no color coding, no intellisense no nothing! That is going to make stumbling through more like groping blindly in the dark while handcuffed and hopping on one foot, which was one of the alternate titles I was considering for this blog. Thankfully, the community comes to my rescue and I wont have to cast my mind back to the glory days of coding in VI (look it up, kids). Using the Extension Manager (Available under the Tools menu), I did a quick search for tt editor in the Online Gallery and quickly found the Tangible T4 Editor: Downloading and installing this was a breeze, and after doing so I got some color coding and intellisense while editing the tt files. If you will be doing any customizing of tt files, I highly recommend installing this extension. Next, well see if that is enough help for us to tweak that tt file to do the kind of code generation that we wantDid you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
reading the file name from user input in MIPS assembly

- by Hassan Al-Jeshi

I'm writing a MIPS assembly code that will ask the user for the file name and it will produce some statistics about the content of the file. However, when I hard code the file name into a variable from the beginning it works just fine, but when I ask the user to input the file name it does not work. after some debugging, I have discovered that the program adds 0x00 char and 0x0a char (check asciitable.com) at the end of user input in the memory and that's why it does not open the file based on the user input. anyone has any idea about how to get rid of those extra chars, or how to open the file after getting its name from the user?? here is my complete code (it is working fine except for the file name from user thing, and anybody is free to use it for any purpose he/she wants to): .data fin: .ascii "" # filename for input msg0: .asciiz "aaaa" msg1: .asciiz "Please enter the input file name:" msg2: .asciiz "Number of Uppercase Char: " msg3: .asciiz "Number of Lowercase Char: " msg4: .asciiz "Number of Decimal Char: " msg5: .asciiz "Number of Words: " nline: .asciiz "\n" buffer: .asciiz "" .text #----------------------- li $v0, 4 la $a0, msg1 syscall li $v0, 8 la $a0, fin li $a1, 21 syscall jal fileRead #read from file move $s1, $v0 #$t0 = total number of bytes li $t0, 0 # Loop counter li $t1, 0 # Uppercase counter li $t2, 0 # Lowercase counter li $t3, 0 # Decimal counter li $t4, 0 # Words counter loop: bge $t0, $s1, end #if end of file reached OR if there is an error in the file lb $t5, buffer($t0) #load next byte from file jal checkUpper #check for upper case jal checkLower #check for lower case jal checkDecimal #check for decimal jal checkWord #check for words addi $t0, $t0, 1 #increment loop counter j loop end: jal output jal fileClose li $v0, 10 syscall fileRead: # Open file for reading li $v0, 13 # system call for open file la $a0, fin # input file name li $a1, 0 # flag for reading li $a2, 0 # mode is ignored syscall # open a file move $s0, $v0 # save the file descriptor # reading from file just opened li $v0, 14 # system call for reading from file move $a0, $s0 # file descriptor la $a1, buffer # address of buffer from which to read li $a2, 100000 # hardcoded buffer length syscall # read from file jr $ra output: li $v0, 4 la $a0, msg2 syscall li $v0, 1 move $a0, $t1 syscall li $v0, 4 la $a0, nline syscall li $v0, 4 la $a0, msg3 syscall li $v0, 1 move $a0, $t2 syscall li $v0, 4 la $a0, nline syscall li $v0, 4 la $a0, msg4 syscall li $v0, 1 move $a0, $t3 syscall li $v0, 4 la $a0, nline syscall li $v0, 4 la $a0, msg5 syscall addi $t4, $t4, 1 li $v0, 1 move $a0, $t4 syscall jr $ra checkUpper: blt $t5, 0x41, L1 #branch if less than 'A' bgt $t5, 0x5a, L1 #branch if greater than 'Z' addi $t1, $t1, 1 #increment Uppercase counter L1: jr $ra checkLower: blt $t5, 0x61, L2 #branch if less than 'a' bgt $t5, 0x7a, L2 #branch if greater than 'z' addi $t2, $t2, 1 #increment Lowercase counter L2: jr $ra checkDecimal: blt $t5, 0x30, L3 #branch if less than '0' bgt $t5, 0x39, L3 #branch if greater than '9' addi $t3, $t3, 1 #increment Decimal counter L3: jr $ra checkWord: bne $t5, 0x20, L4 #branch if 'space' addi $t4, $t4, 1 #increment words counter L4: jr $ra fileClose: # Close the file li $v0, 16 # system call for close file move $a0, $s0 # file descriptor to close syscall # close file jr $ra Note: I'm using MARS Simulator, if that makes any different

Read the article
Problem with Google Analytics for Android : "Dispatcher thinks it finished, but there were 543 faile

- by PHP_Jedi

Anyone know how to solve this problem? 03-23 13:03:20.585: WARN/googleanalytics(3430): Problem with socket or streams. 03-23 13:03:20.585: WARN/googleanalytics(3430): java.net.SocketException: Broken pipe 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.harmony.luni.platform.OSNetworkSystem.sendStreamImpl(Native Method) 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.harmony.luni.platform.OSNetworkSystem.sendStream(OSNetworkSystem.java:498) 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.harmony.luni.net.PlainSocketImpl.write(PlainSocketImpl.java:585) 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.harmony.luni.net.SocketOutputStream.write(SocketOutputStream.java:59) 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.http.impl.io.AbstractSessionOutputBuffer.flushBuffer(AbstractSessionOutputBuffer.java:87) 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.http.impl.io.AbstractSessionOutputBuffer.flush(AbstractSessionOutputBuffer.java:94) 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.http.impl.AbstractHttpClientConnection.doFlush(AbstractHttpClientConnection.java:168) 03-23 13:03:20.585: WARN/googleanalytics(3430): at org.apache.http.impl.AbstractHttpClientConnection.flush(AbstractHttpClientConnection.java:173) 03-23 13:03:20.585: WARN/googleanalytics(3430): at com.google.android.apps.analytics.PipelinedRequester.sendRequests(Unknown Source) 03-23 13:03:20.585: WARN/googleanalytics(3430): at com.google.android.apps.analytics.NetworkDispatcher$DispatcherThread$AsyncDispatchTask.dispatchSomePendingEvents(Unknown Source) 03-23 13:03:20.585: WARN/googleanalytics(3430): at com.google.android.apps.analytics.NetworkDispatcher$DispatcherThread$AsyncDispatchTask.run(Unknown Source) 03-23 13:03:20.585: WARN/googleanalytics(3430): at android.os.Handler.handleCallback(Handler.java:587) 03-23 13:03:20.585: WARN/googleanalytics(3430): at android.os.Handler.dispatchMessage(Handler.java:92) 03-23 13:03:20.585: WARN/googleanalytics(3430): at android.os.Looper.loop(Looper.java:123) 03-23 13:03:20.585: WARN/googleanalytics(3430): at android.os.HandlerThread.run(HandlerThread.java:60) 03-23 13:03:21.088: WARN/googleanalytics(3430): Dispatcher thinks it finished, but there were 543 failed events Specially the last line explain why there is lost so much data, as the dispatcher thinks it is done, but have 543 events not dispatched... The application have a good internet connection and there is no problem reaching the app server-side api. I see in analytics that lots of startups and click-events the past few days are lost, even I know the traffic is normal since i can see statistics from the the server api. In the analytics reports I see a day by day under-reporting. So the problems seems to be spreading/growing to all the devices using this application. Im wondering why google does not answer this in their mail-groups - several people have complained about this...well, well... I found this thread relevant: http://stackoverflow.com/questions/682560/java-net-socketexception-broken-pipe But, I'm still not sure if there is anything I can do to fix it or not. If there is nothing I can do to fix it, I guess its not my fault that it got broken. But i got a feeling it is, since the problem got dramatically worse on the last deploy to Android market. Anyone else with experience on Google Analytics for android ?

Read the article
Web Safe Area (optimal resolution) for web app design

- by M.A.X

I'm in the process of designing a new web app and I'm wondering for what 'web safe area' should I optimize the app layout and design. I did some investigation and thinking on my own but wanted to share this to see what the general opinion is. Here is what I found: Optimal Display Resolution: w3schools web stats seems to be the most referenced source (however they state that these are results from their site and is biased towards tech savvy users) http://www.w3counter.com/globalstats.php (aggregate data from something like 15,000 different sites that use their tracking services) StatCounter Global Stats Display Resolution (Stats are based on aggregate data collected by StatCounter on a sample exceeding 15 billion pageviews per month collected from across the StatCounter network of more than 3 million websites) NetMarketShare Screen Resolutions (marketshare.hitslink.com) (a web analytics consulting firm, they get data from browsers of site visitors to their on-demand network of live stats customers. The data is compiled from approximately 160 million visitors per month) Display Resolution Summary: There is a bit of variation between the above sources but in general as of Jan 2011 looks like 1024x768 is about 20%, while ~85% have a higher resolution of at least 1280x768 (1280x800 is the most common of these with 15-20% of total web, depending on the source; 1280x1024 and 1366x768 follow behind with 9-14% of the share). My guess would be that the higher resolution values will be even more common if we filter on North America, and even higher if we filter on N.American corporate users (unfortunately I couldn't find any free geographically filtered statistics). Another point to note is that the 1024x768 desktop user population is likely lower than the aforementioned 20%, seeing as the iPad (1024x768 native display) is likely propping up those number. My recommendation would be to optimize around the 1280x768 constraint (*note: 1280x768 is actually a relatively rare resolution, but I think it's a valid constraint range considering that 1366x768 is relatively common and 1280 is the most common horizontal resolution). Browser + OS Constraints: To further add to the constraints we have to subtract the space taken up by the browser (assuming IE, which is the most space consuming) and the OS (assuming WinXP-Win7): Win7 has the biggest taskbar footprint at a height of 40px (XP's and Vista's is 30px) The default IE8 view uses up 25px at the bottom of the screen with the status bar and a further 120px at the top of the screen with the windows title bar and the browser UI (assuming the default 'favorites' toolbar is present, it would instead be 91px without the favorites toolbar). Assuming no scrollbar, we also loose a total of 4px horizontally for the window outline. This means that we are left with 583px of vertical space and 1276px of horizontal. In other words, a Web Safe Area of 1276 x 583 Is this a correct line of thinking? I tried to Google some design best practices but most still talk about designing around 1024x768 which seems to be quickly disappearing. Any help on this would be greatly appreciated! Thanks.

Read the article
Problem with signal handlers being called too many times [closed]

- by Hristo

how can something print 3 times when it only goes the printing code twice? I'm coding in C and the code is in a SIGCHLD signal handler I created. void chld_signalHandler() { int pidadf = (int) getpid(); printf("pidafdfaddf: %d\n", pidadf); while (1) { int termChildPID = waitpid(-1, NULL, WNOHANG); if (termChildPID == 0 || termChildPID == -1) { break; } dll_node_t *temp = head; while (temp != NULL) { printf("stuff\n"); if (temp->pid == termChildPID && temp->type == WORK) { printf("inside if\n"); // read memory mapped file b/w WORKER and MAIN // get statistics and write results to pipe char resultString[256]; // printing TIME int i; for (i = 0; i < 24; i++) { sprintf(resultString, "TIME; %d ; %d ; %d ; %s\n",i,1,2,temp->stats->mboxFileName); fwrite(resultString, strlen(resultString), 1, pipeFD); } remove_node(temp); break; } temp = temp->next; } printf("done printing from sigchld \n"); } return; } the output for my MAIN process is this: MAIN PROCESS 16214 created WORKER PROCESS 16220 for file class.sp10.cs241.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld MAIN PROCESS 16214 created WORKER PROCESS 16221 for file class.sp10.cs225.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld and the output for the MONITOR process is this: MONITOR: pipe is open for reading MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs241.mbox MONITOR: end of readpipe ( I've taken out repeating lines so I don't take up so much space ) Thanks, Hristo

Read the article
Problem with signal handlers

- by Hristo

how can something print 3 times when it only goes the printing code twice? I'm coding in C and the code is in a SIGCHLD signal handler I created. void chld_signalHandler() { int pidadf = (int) getpid(); printf("pidafdfaddf: %d\n", pidadf); while (1) { int termChildPID = waitpid(-1, NULL, WNOHANG); if (termChildPID == 0 || termChildPID == -1) { break; } dll_node_t *temp = head; while (temp != NULL) { printf("stuff\n"); if (temp->pid == termChildPID && temp->type == WORK) { printf("inside if\n"); // read memory mapped file b/w WORKER and MAIN // get statistics and write results to pipe char resultString[256]; // printing TIME int i; for (i = 0; i < 24; i++) { sprintf(resultString, "TIME; %d ; %d ; %d ; %s\n",i,1,2,temp->stats->mboxFileName); fwrite(resultString, strlen(resultString), 1, pipeFD); } remove_node(temp); break; } temp = temp->next; } printf("done printing from sigchld \n"); } return; } the output for my MAIN process is this: MAIN PROCESS 16214 created WORKER PROCESS 16220 for file class.sp10.cs241.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld MAIN PROCESS 16214 created WORKER PROCESS 16221 for file class.sp10.cs225.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld and the output for the MONITOR process is this: MONITOR: pipe is open for reading MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs241.mbox MONITOR: end of readpipe ( I've taken out repeating lines so I don't take up so much space ) Thanks, Hristo

Read the article
INSERT OR IGNORE in a trigger

- by dan04

I have a database (for tracking email statistics) that has grown to hundreds of megabytes, and I've been looking for ways to reduce it. It seems that the main reason for the large file size is that the same strings tend to be repeated in thousands of rows. To avoid this problem, I plan to create another table for a string pool, like so: CREATE TABLE AddressLookup ( ID INTEGER PRIMARY KEY AUTOINCREMENT, Address TEXT UNIQUE ); CREATE TABLE EmailInfo ( MessageID INTEGER PRIMARY KEY AUTOINCREMENT, ToAddrRef INTEGER REFERENCES AddressLookup(ID), FromAddrRef INTEGER REFERENCES AddressLookup(ID) /* Additional columns omitted for brevity. */ ); And for convenience, a view to join these tables: CREATE VIEW EmailView AS SELECT MessageID, A1.Address AS ToAddr, A2.Address AS FromAddr FROM EmailInfo LEFT JOIN AddressLookup A1 ON (ToAddrRef = A1.ID) LEFT JOIN AddressLookup A2 ON (FromAddrRef = A2.ID); In order to be able to use this view as if it were a regular table, I've made some triggers: CREATE TRIGGER trg_id_EmailView INSTEAD OF DELETE ON EmailView BEGIN DELETE FROM EmailInfo WHERE MessageID = OLD.MessageID; END; CREATE TRIGGER trg_ii_EmailView INSTEAD OF INSERT ON EmailView BEGIN INSERT OR IGNORE INTO AddressLookup(Address) VALUES (NEW.ToAddr); INSERT OR IGNORE INTO AddressLookup(Address) VALUES (NEW.FromAddr); INSERT INTO EmailInfo SELECT NEW.MessageID, A1.ID, A2.ID FROM AddressLookup A1, AddressLookup A2 WHERE A1.Address = NEW.ToAddr AND A2.Address = NEW.FromAddr; END; CREATE TRIGGER trg_iu_EmailView INSTEAD OF UPDATE ON EmailView BEGIN UPDATE EmailInfo SET MessageID = NEW.MessageID WHERE MessageID = OLD.MessageID; REPLACE INTO EmailView SELECT NEW.MessageID, NEW.ToAddr, NEW.FromAddr; END; The problem After: INSERT OR REPLACE INTO EmailView VALUES (1, '[email protected]', '[email protected]'); INSERT OR REPLACE INTO EmailView VALUES (2, '[email protected]', '[email protected]'); The updated rows contain: MessageID ToAddr FromAddr --------- ------ -------- 1 NULL [email protected] 2 [email protected] [email protected] There's a NULL that shouldn't be there. The corresponding cell in the EmailInfo table contains an orphaned ToAddrRef value. If you do the INSERTs one at a time, you'll see that Alice's ID in the AddressLookup table changes! It appears that this behavior is documented: An ON CONFLICT clause may be specified as part of an UPDATE or INSERT action within the body of the trigger. However if an ON CONFLICT clause is specified as part of the statement causing the trigger to fire, then conflict handling policy of the outer statement is used instead. So the "REPLACE" in the top-level "INSERT OR REPLACE" statement is overriding the critical "INSERT OR IGNORE" in the trigger program. Is there a way I can make it work the way that I wanted?

Read the article
i don't understand how...

- by Hristo

how can something print 3 times when it only goes the printing code twice? I'm coding in C and the code is in a SIGCHLD signal handler I created. void chld_signalHandler() { int pidadf = (int) getpid(); printf("pidafdfaddf: %d\n", pidadf); while (1) { int termChildPID = waitpid(-1, NULL, WNOHANG); if (termChildPID == 0 || termChildPID == -1) { break; } dll_node_t *temp = head; while (temp != NULL) { printf("stuff\n"); if (temp-pid == termChildPID && temp-type == WORK) { printf("inside if\n"); // read memory mapped file b/w WORKER and MAIN // get statistics and write results to pipe char resultString[256]; // printing TIME int i; for (i = 0; i < 24; i++) { sprintf(resultString, "TIME; %d ; %d ; %d ; %s\n",i,1,2,temp->stats->mboxFileName); fwrite(resultString, strlen(resultString), 1, pipeFD); } remove_node(temp); break; } temp = temp-next; } printf("done printing from sigchld \n"); } return; } the output for my MAIN process is this: MAIN PROCESS 16214 created WORKER PROCESS 16220 for file class.sp10.cs241.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld MAIN PROCESS 16214 created WORKER PROCESS 16221 for file class.sp10.cs225.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld and the output for the MONITOR process is this: MONITOR: pipe is open for reading MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs241.mbox MONITOR: end of readpipe ( I've taken out repeating lines so I don't take up so much space ) Thanks, Hristo

Read the article
bin_at in dlmalloc

- by chunhui

In glibc malloc.c or dlmalloc It said "repositioning tricks"As in blew, and use this trick in bin_at. bins is a array,the space is allocated when av(struct malloc_state) is allocated.doesn't it? the sizeof(bin[i]) is less then sizeof(struct malloc_chunk*)? Who can describe this trick for me? I can't understand the bin_at macro.why they get the bins address use this method?how it works? Very thanks,and sorry for my poor English. /* To simplify use in double-linked lists, each bin header acts as a malloc_chunk. This avoids special-casing for headers. But to conserve space and improve locality, we allocate only the fd/bk pointers of bins, and then use repositioning tricks to treat these as the fields of a malloc_chunk*. */ typedef struct malloc_chunk* mbinptr; /* addressing -- note that bin_at(0) does not exist */ #define bin_at(m, i) \ (mbinptr) (((char *) &((m)->bins[((i) - 1) * 2])) \ - offsetof (struct malloc_chunk, fd)) The malloc_chunk struct like this: struct malloc_chunk { INTERNAL_SIZE_T prev_size; /* Size of previous chunk (if free). */ INTERNAL_SIZE_T size; /* Size in bytes, including overhead. */ struct malloc_chunk* fd; /* double links -- used only if free. */ struct malloc_chunk* bk; /* Only used for large blocks: pointer to next larger size. */ struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */ struct malloc_chunk* bk_nextsize; }; And the bin type like this: typedef struct malloc_chunk* mbinptr; struct malloc_state { /* Serialize access. */ mutex_t mutex; /* Flags (formerly in max_fast). */ int flags; #if THREAD_STATS /* Statistics for locking. Only used if THREAD_STATS is defined. */ long stat_lock_direct, stat_lock_loop, stat_lock_wait; #endif /* Fastbins */ mfastbinptr fastbinsY[NFASTBINS]; /* Base of the topmost chunk -- not otherwise kept in a bin */ mchunkptr top; /* The remainder from the most recent split of a small request */ mchunkptr last_remainder; /* Normal bins packed as described above */ mchunkptr bins[NBINS * 2 - 2]; /* Bitmap of bins */ unsigned int binmap[BINMAPSIZE]; /* Linked list */ struct malloc_state *next; #ifdef PER_THREAD /* Linked list for free arenas. */ struct malloc_state *next_free; #endif /* Memory allocated from the system in this arena. */ INTERNAL_SIZE_T system_mem; INTERNAL_SIZE_T max_system_mem; };

Read the article
Calculating a Sample Covariance Matrix for Groups with plyr

- by John A. Ramey

I'm going to use the sample code from http://gettinggeneticsdone.blogspot.com/2009/11/split-apply-and-combine-in-r-using-plyr.html for this example. So, first, let's copy their example data: mydata=data.frame(X1=rnorm(30), X2=rnorm(30,5,2), SNP1=c(rep("AA",10), rep("Aa",10), rep("aa",10)), SNP2=c(rep("BB",10), rep("Bb",10), rep("bb",10))) I am going to ignore SNP2 in this example and just pretend the values in SNP1 denote group membership. So then, I may want some summary statistics about each group in SNP1: "AA", "Aa", "aa". Then if I want to calculate the means for each variable, it makes sense (modifying their code slightly) to use: > ddply(mydata, c("SNP1"), function(df) data.frame(meanX1=mean(df$X1), meanX2=mean(df$X2))) SNP1 meanX1 meanX2 1 aa 0.05178028 4.812302 2 Aa 0.30586206 4.820739 3 AA -0.26862500 4.856006 But what if I want the sample covariance matrix for each group? Ideally, I would like a 3D array, where the I have the covariance matrix for each group, and the third dimension denotes the corresponding group. I tried a modified version of the previous code and got the following results that have convinced me that I'm doing something wrong. > daply(mydata, c("SNP1"), function(df) cov(cbind(df$X1, df$X2))) , , = 1 SNP1 1 2 aa 1.4961210 -0.9496134 Aa 0.8833190 -0.1640711 AA 0.9942357 -0.9955837 , , = 2 SNP1 1 2 aa -0.9496134 2.881515 Aa -0.1640711 2.466105 AA -0.9955837 4.938320 I was thinking that the dim() of the 3rd dimension would be 3, but instead, it is 2. Really this is a sliced up version of the covariance matrix for each group. If we manually compute the sample covariance matrix for aa, we get: [,1] [,2] [1,] 1.4961210 -0.9496134 [2,] -0.9496134 2.8815146 Using plyr, the following gives me what I want in list() form: > dlply(mydata, c("SNP1"), function(df) cov(cbind(df$X1, df$X2))) $aa [,1] [,2] [1,] 1.4961210 -0.9496134 [2,] -0.9496134 2.8815146 $Aa [,1] [,2] [1,] 0.8833190 -0.1640711 [2,] -0.1640711 2.4661046 $AA [,1] [,2] [1,] 0.9942357 -0.9955837 [2,] -0.9955837 4.9383196 attr(,"split_type") [1] "data.frame" attr(,"split_labels") SNP1 1 aa 2 Aa 3 AA But like I said earlier, I would really like this in a 3D array. Any thoughts on where I went wrong with daply() or suggestions? Of course, I could typecast the list from dlply() to a 3D array, but I'd rather not do this because I will be repeating this process many times in a simulation. As a side note, I found one method (http://www.mail-archive.com/[email protected]/msg86328.html) that provides the sample covariance matrix for each group, but the outputted object is bloated. Thanks in advance.

Read the article
Given a trace of packets, how would you group them into flows?

- by zxcvbnm

I've tried it these ways so far: 1) Make a hash with the source IP/port and destination IP/port as keys. Each position in the hash is a list of packets. The hash is then saved in a file, with each flow separated by some special characters/line. Problem: Not enough memory for large traces. 2) Make a hash with the same key as above, but only keep in memory the file handles. Each packet is then put into the hash[key] that points to the right file. Problems: Too many flows/files (~200k) and it might run out of memory as well. 3) Hash the source IP/port and destination IP/port, then put the info inside a file. The difference between 2 and 3 is that here the files are opened and closed for each operation, so I don't have to worry about running out of memory because I opened too many at the same time. Problems: WAY too slow, same number of files as 2 so also impractical. 4) Make a hash of the source IP/port pairs and then iterate over the whole trace for each flow. Take the packets that are part of that flow and place them into the output file. Problem: Suppose I have a 60 MB trace that has 200k flows. This way, I would process, say, a 60 MB file 200k times. Maybe removing the packets as I iterate would make it not so painful, but so far I'm not sure this would be a good solution. 5) Split them by IP source/destination and then create a single file for each one, separating the flows by special characters. Still too many files (+50k). Right now I'm using Ruby to do it, which might've been a bad idea, I guess. Currently I've filtered the traces with tshark so that they only have relevant info, so I can't really make them any smaller. I thought about loading everything in memory as described in 1) using C#/Java/C++, but I was wondering if there wouldn't be a better approach here, especially since I might also run out of memory later on even with a more efficient language if I have to use larger traces. In summary, the problem I'm facing is that I either have too many files or that I run out of memory. I've also tried searching for some tool to filter the info, but I don't think there is one. The ones I've found only return some statistics and wouldn't scan for every flow as I need.

Read the article
Square Peg Web: Gets you the traffic to where it matters most: Your Website!

- by demetriusalwyn

Have you decided to start your business online or is your business not reaching the targeted audience? Come to Square Peg Web; where you will find what you want to make your business reach new heights. The team at Square Peg Web is professionals who understand what you want and make sure you get it right. Our confidence stems from the fact of thousands of satisfied clients who keep referring friends and business associates to us and we do not let our clients down. Many companies promise the sky but how far is does their work live up to the promises? We do not know about the others however, we are sure that we strive to put together all our ideas and thoughts to make your website rank among the top. Web hosting is something that needs to have a personal touch; Square Peg Web customizes everything to suit your requirements so that you do not have to look further. With Square Peg Web you have a host of features to make your Business go viral. Some of the product details that are offered with Square Peg Web are unlimited product options/ variants/ properties giving you an option on price modifiers. You get unlimited customized input fields for your products and you can also Customer-define the prices. Square Peg Web provides you an option of using multiple product images with zoom features and one can also list a particular product in several categories. There are other aspects which make Square Peg Web the best choice for your website needs; every sale of yours’ is important to you and to us. We make sure that each sale is tracked by the product and also the list of bestsellers that appeal to the audience. Other comprehensive statistics of Square Peg Web includes searchable order data, an interface for shipments and order fulfillments, export sales & customer data for usage in a spreadsheet and the ability to export orders to QuickBooks format. With Square Peg Web; Admin Panel is a lot simpler. Administrative access is completely password protected and any changes done are all in real-time. You can have absolute control on the cart from anywhere around the world using your web browser and the topping on the cake is the unlimited amount of admin accounts that can be created for you. Square Peg Web offers you a world of experience with the options of choosing from marketing websites to e-commerce and from customized applications to community oriented sites. Some of the projects which appear in the portfolio of Square Peg Web are Online Marketing Web Sites, E-Commerce Web Sites, customized web applications, Blog designing and programming, video sharing and the option of downloading web sites, online advertisements, flash animation, customer and product support web sites, web site re-designing and planning and complete information architecture.

Read the article
JPA Entity Manager resource handling

- by chiragshahkapadia

Every time I call JPA method its creating entity and binding query. My persistence properties are: <property name="hibernate.dialect" value="org.hibernate.dialect.Oracle10gDialect"/> <property name="hibernate.cache.provider_class" value="net.sf.ehcache.hibernate.SingletonEhCacheProvider"/> <property name="hibernate.cache.use_second_level_cache" value="true"/> <property name="hibernate.cache.use_query_cache" value="true"/> And I am creating entity manager the way shown below: emf = Persistence.createEntityManagerFactory("pu"); em = emf.createEntityManager(); em = Persistence.createEntityManagerFactory("pu").createEntityManager(); Is there any nice way to manage entity manager resource instead create new every time or any property can set in persistence. Remember it's JPA. See below binding log every time : 15:35:15,527 INFO [AnnotationBinder] Binding entity from annotated class: * 15:35:15,527 INFO [QueryBinder] Binding Named query: * = * 15:35:15,527 INFO [QueryBinder] Binding Named query: * = * 15:35:15,527 INFO [QueryBinder] Binding Named query: 15:35:15,527 INFO [QueryBinder] Binding Named query: 15:35:15,527 INFO [QueryBinder] Binding Named query: 15:35:15,527 INFO [QueryBinder] Binding Named query: 15:35:15,527 INFO [QueryBinder] Binding Named query: 15:35:15,527 INFO [QueryBinder] Binding Named query: 15:35:15,527 INFO [QueryBinder] Binding Named query: 15:35:15,527 INFO [EntityBinder] Bind entity com.* on table * 15:35:15,542 INFO [HibernateSearchEventListenerRegister] Unable to find org.hibernate.search.event.FullTextIndexEventListener on the classpath. Hibernate Search is not enabled. 15:35:15,542 INFO [NamingHelper] JNDI InitialContext properties:{} 15:35:15,542 INFO [DatasourceConnectionProvider] Using datasource: 15:35:15,542 INFO [SettingsFactory] RDBMS: and Real Application Testing options 15:35:15,542 INFO [SettingsFactory] JDBC driver: Oracle JDBC driver, version: 9.2.0.1.0 15:35:15,542 INFO [Dialect] Using dialect: org.hibernate.dialect.Oracle10gDialect 15:35:15,542 INFO [TransactionFactoryFactory] Transaction strategy: org.hibernate.transaction.JDBCTransactionFactory 15:35:15,542 INFO [TransactionManagerLookupFactory] No TransactionManagerLookup configured (in JTA environment, use of read-write or transactional second-level cache is not recomm ended) 15:35:15,542 INFO [SettingsFactory] Automatic flush during beforeCompletion(): disabled 15:35:15,542 INFO [SettingsFactory] Automatic session close at end of transaction: disabled 15:35:15,542 INFO [SettingsFactory] JDBC batch size: 15 15:35:15,542 INFO [SettingsFactory] JDBC batch updates for versioned data: disabled 15:35:15,542 INFO [SettingsFactory] Scrollable result sets: enabled 15:35:15,542 INFO [SettingsFactory] JDBC3 getGeneratedKeys(): disabled 15:35:15,542 INFO [SettingsFactory] Connection release mode: auto 15:35:15,542 INFO [SettingsFactory] Default batch fetch size: 1 15:35:15,542 INFO [SettingsFactory] Generate SQL with comments: disabled 15:35:15,542 INFO [SettingsFactory] Order SQL updates by primary key: disabled 15:35:15,542 INFO [SettingsFactory] Order SQL inserts for batching: disabled 15:35:15,542 INFO [SettingsFactory] Query translator: org.hibernate.hql.ast.ASTQueryTranslatorFactory 15:35:15,542 INFO [ASTQueryTranslatorFactory] Using ASTQueryTranslatorFactory 15:35:15,542 INFO [SettingsFactory] Query language substitutions: {} 15:35:15,542 INFO [SettingsFactory] JPA-QL strict compliance: enabled 15:35:15,542 INFO [SettingsFactory] Second-level cache: enabled 15:35:15,542 INFO [SettingsFactory] Query cache: enabled 15:35:15,542 INFO [SettingsFactory] Cache region factory : org.hibernate.cache.impl.bridge.RegionFactoryCacheProviderBridge 15:35:15,542 INFO [RegionFactoryCacheProviderBridge] Cache provider: net.sf.ehcache.hibernate.SingletonEhCacheProvider 15:35:15,542 INFO [SettingsFactory] Optimize cache for minimal puts: disabled 15:35:15,542 INFO [SettingsFactory] Structured second-level cache entries: disabled 15:35:15,542 INFO [SettingsFactory] Query cache factory: org.hibernate.cache.StandardQueryCacheFactory 15:35:15,542 INFO [SettingsFactory] Statistics: disabled 15:35:15,542 INFO [SettingsFactory] Deleted entity synthetic identifier rollback: disabled 15:35:15,542 INFO [SettingsFactory] Default entity-mode: pojo 15:35:15,542 INFO [SettingsFactory] Named query checking : enabled 15:35:15,542 INFO [SessionFactoryImpl] building session factory 15:35:15,542 INFO [SessionFactoryObjectFactory] Not binding factory to JNDI, no JNDI name configured 15:35:15,542 INFO [UpdateTimestampsCache] starting update timestamps cache at region: org.hibernate.cache.UpdateTimestampsCache 15:35:15,542 INFO [StandardQueryCache] starting query cache at region: org.hibernate.cache.StandardQueryCache

Read the article
java concurrency: many writers, one reader

- by Janning

I need to gather some statistics in my software and i am trying to make it fast and correct, which is not easy (for me!) first my code so far with two classes, a StatsService and a StatsHarvester public class StatsService { private Map<String, Long> stats = new HashMap<String, Long>(1000); public void notify ( String key ) { Long value = 1l; synchronized (stats) { if (stats.containsKey(key)) { value = stats.get(key) + 1; } stats.put(key, value); } } public Map<String, Long> getStats ( ) { Map<String, Long> copy; synchronized (stats) { copy = new HashMap<String, Long>(stats); stats.clear(); } return copy; } } this is my second class, a harvester which collects the stats from time to time and writes them to a database. public class StatsHarvester implements Runnable { private StatsService statsService; private Thread t; public void init ( ) { t = new Thread(this); t.start(); } public synchronized void run ( ) { while (true) { try { wait(5 * 60 * 1000); // 5 minutes collectAndSave(); } catch (InterruptedException e) { e.printStackTrace(); } } } private void collectAndSave ( ) { Map<String, Long> stats = statsService.getStats(); // do something like: // saveRecords(stats); } } At runtime it will have about 30 concurrent running threads each calling notify(key) about 100 times. Only one StatsHarvester is calling statsService.getStats() So i have many writers and only one reader. it would be nice to have accurate stats but i don't care if some records are lost on high concurrency. The reader should run every 5 Minutes or whatever is reasonable. Writing should be as fast as possible. Reading should be fast but if it locks for about 300ms every 5 minutes, its fine. I've read many docs (Java concurrency in practice, effective java and so on), but i have the strong feeling that i need your advice to get it right. I hope i stated my problem clear and short enough to get valuable help.

Read the article
Optimising speeds in HDF5 using Pytables

- by Sree Aurovindh

The problem is with respect to the writing speed of the computer (10 * 32 bit machine) and the postgresql query performance.I will explain the scenario in detail. I have data about 80 Gb (along with approprite database indexes in place). I am trying to read it from Postgresql database and writing it into HDF5 using Pytables.I have 1 table and 5 variable arrays in one hdf5 file.The implementation of Hdf5 is not multithreaded or enabled for symmetric multi processing.I have rented about 10 computers for a day and trying to write them inorder to speed up my data handling. As for as the postgresql table is concerned the overall record size is 140 million and I have 5 primary- foreign key referring tables.I am not using joins as it is not scalable So for a single lookup i do 6 lookup without joins and write them into hdf5 format. For each lookup i do 6 inserts into each of the table and its corresponding arrays. The queries are really simple select * from x.train where tr_id=1 (primary key & indexed) select q_t from x.qt where q_id=2 (non-primary key but indexed) (similarly five queries) Each computer writes two hdf5 files and hence the total count comes around 20 files. Some Calculations and statistics: Total number of records : 14,37,00,000 Total number of records per file : 143700000/20 =71,85,000 The total number of records in each file : 71,85,000 * 5 = 3,59,25,000 Current Postgresql database config : My current Machine : 8GB RAM with i7 2nd generation Processor. I made changes to the following to postgresql configuration file : shared_buffers : 2 GB effective_cache_size : 4 GB Note on current performance: I have run it for about ten hours and the performance is as follows: The total number of records written for each file is about 6,21,000 * 5 = 31,05,000 The bottle neck is that i can only rent it for 10 hours per day (overnight) and if it processes in this speed it will take about 11 days which is too high for my experiments. Please suggest me on how to improve. Questions: 1. Should i use Symmetric multi processing on those desktops(it has 2 cores with about 2 GB of RAM).In that case what is suggested or prefereable? 2. If i change my postgresql configuration file and increase the RAM will it enhance my process. 3. Should i use multi threading.. In that case any links or pointers would be of great help Thanks Sree aurovindh V

Read the article
SEO Help with Pages Indexed by Google

- by Joe Majewski

I'm working on optimizing my site for Google's search engine, and lately I've noticed that when doing a "site:www.joemajewski.com" query, I get results for pages that shouldn't be indexed at all. Let's take a look at this page, for example: http://www.joemajewski.com/wow/profile.php?id=3 I created my own CMS, and this is simply a breakdown of user id #3's statistics, which I noticed is indexed by Google, although it shouldn't be. I understand that it takes some time before Google's results reflect accurately on my site's content, but this has been improperly indexed for nearly six months now. Here are the precautions that I have taken: My robots.txt file has a line like this: Disallow: /wow/profile.php* When running the url through Google Webmaster Tools, it indicates that I did, indeed, correctly create the disallow command. It did state, however, that a page that doesn't get crawled may still get displayed in the search results if it's being linked to. Thus, I took one more precaution. In the source code I included the following meta data: <meta name="robots" content="noindex,follow" /> I am assuming that follow means to use the page when calculating PageRank, etc, and the noindex tells Google to not display the page in the search results. This page, profile.php, is used to take the $_GET['id'] and find the corresponding registered user. It displays a bit of information about that user, but is in no way relevant enough to warrant a display in the search results, so that is why I am trying to stop Google from indexing it. This is not the only page Google is indexing that I would like removed. I also have a WordPress blog, and there are many category pages, tag pages, and archive pages that I would like removed, and am doing the same procedures to attempt to remove them. Can someone explain how to get pages removed from Google's search results, and possibly some criteria that should help determine what types of pages that I don't want indexed. In terms of my WordPress blog, the only pages that I truly want indexed are my articles. Everything else I have tried to block, with little luck from Google. Can someone also explain why it's bad to have pages indexed that don't provide any new or relevant content, such as pages for WordPress tags or categories, which are clearly never going to receive traffic from Google. Thanks!

Read the article
Google Code + SVN or GitHub + Git

- by Nazgulled

Let me start by telling you that I never used anything besides SVN and I'm also a Windows user. I have a couple of simple projects that are open-source, others are on there way when I'm happy enough to release their source code but either way, I was thinking of using Google Code and SVN to share the source code of my projects instead of providing a link to the source on my website. This as always been a pain cause I had to update the binaries and the code every time I released a new version. This would also help me out to have a backup of my code some where instead of just my local machine (I used to have a local Subversion server running). What I want from a service like this is very simple... I just want a place to store my source code that people can download if they want, allows me to control revisions and provide a simple and easy issue system so people can submit bugs and stuff like that. I guess both of them have this. But I don't want to host any binaries in their websites, I want this to be hosted on my website so I can control download statistics with my own scripts, I also don't have the need for wiki pages as I prefer to have all the documentation in my own website. Does anyone of this services provide a way to "disable" features like wiki and downloads and don't show them at all for my project(s)? Now, I'm sure there are lots of pros and cons about using Google Code with SVN and GitHub with Git (of course) but here's what it's important for me on each one and why I like them: Google Code: As with any Google page, the complexity is almost non-existent Everyone (or almost) as a Google account and this is nice if people want to report problems using the issues system GitHub: May (or may not) be a little more complex (not a problem for me though) than Google's pages but... ...has a much prettier interface than Google's service It needs people to be registered on GitHub to post about issues I like the fact that with Git, you have your own revisions locally (can I use TortoiseGit for this or?) Basically that's it, not much I know... What other, most common, pros and cons can you tell me about each site/software? Keep in mind that my projects are simple, I'm probably the only one who will ever develop these projects on these repositories (or maybe not, for now I will)

Read the article
.Net Remote Log Querying

- by jlafay

I have a Win Service that I'm working on that consists of the service, WF Service (using WorkflowServiceHost), a Workflow (WorkflowApplication) that queries/processes data from a SQL Server DB, and a Comm Marshall class that handles data flow between the service and the WF. The WF does a lot of heavy data processing and the original app (early VB6) logged all the processing and displayed the results on the screen of the host machine. Critical events will be committed to eventlog because I strongly believe that should be common practice because admins naturally will look there and because it already has support for remote viewing. The workflow will also need to write logging events as it processes and iterates according to our business logic. Such as: records queried, records returned, processed records, etc. The data is very critical and we need to log actions as they occur. The logs are currently kept as text files on disk and I think that is ok. Ideally I would like to record log events in XML so it's easier to query and because it is less costly than a DB, especially since our DB servers do a lot of heavy processing anyways. Since we are replacing essentially a VB6 application with a robust windows service (taking advantage of WF 4.0), it has been requested that a remote client also be created. It receives callbacks from the service after subscribing to it and being added to a collection of subscribers. Basic statistics and summaries are updated client side after receiving basic monitoring data of what is going on with the service. We would like to also provide a way to provide details when we need to examine what is going on further because this is a long running data processing service and issues need to be addressed immediately. What is the best way to implement some type of query from the client that is sent to the service and returned to the client? Would it be efficient to implement another method to expose on the service and then have that pass that off to some querying class/object to examine the XML files by whichever specification and then return it to the client? That's the main concern. I don't want the service to processing to bottleneck much while this occurs. It seems that WF already auto-magically threads well for the most part but I want to make sure this is the right way to go about it. Any suggestions/recommendations on how to architect and implement a small log querying framework for a remote service would be awesome.

Read the article
How do I pass the value of the previous form element into an "onchange" javascript function?

- by Jen

Hello, I want to make some UI improvements to a page I am developing. Specifically, I need to add another drop down menu to allow the user to filter results. This is my current code: HTML file: <select name="test_id" onchange="showGrid(this.name, this.value, 'gettestgrid')"> <option selected>Select a test--></option> <option value=1>Test 1</option> <option value=2>Test 2</option> <option value=3>Test 3</option> </select> This is pseudo code for what I want to happen: <select name="test_id"> <option selected>Select a test--></option> <option value=1>Test 1</option> <option value=2>Test 2</option> <option value=3>Test 3</option> </select> <select name="statistics" onchange="showGrid(PREVIOUS.name, PREVIOUS.VALUE, THIS.value)"> <option selected>Select a data display --></option> <option value='gettestgrid'>Show averages by student</option> <option value='gethomeroomgrid'>Show averages by homeroom</option> <option value='getschoolgrid'>Show averages by school</option> </select> How do I access the previous field's name and value? Any help much appreciated, thx! Also, JS function for reference: function showGrid(name, value, phpfile) { xmlhttp=GetXmlHttpObject(); if (xmlhttp==null) { alert ("Browser does not support HTTP Request"); return; } var url=phpfile+".php"; url=url+"?"+name+"="+value; url=url+"&sid="+Math.random(); xmlhttp.onreadystatechange=stateChanged; xmlhttp.open("GET",url,true); xmlhttp.send(null); }

Read the article
Remote Postgresql - extremely slow

- by Muffinbubble

Hi, I have setup PostgreSQL on a VPS I own - the software that accesses the database is a program called PokerTracker. PokerTracker logs all your hands and statistics whilst playing online poker. I wanted this accessible from several different computers so decided to installed it on my VPS and after a few hiccups I managed to get it connecting without errors. However, the performance is dreadful. I have done tons of research on 'remote postgresql slow' etc and am yet to find an answer so am hoping someone is able to help. Things to note: The query I am trying to execute is very small. Whilst connecting locally on the VPS, the query runs instantly. While running it remotely, it takes about 1 minute and 30 seconds to run the query. The VPS is running 100MBPS and then computer I'm connecting to it from is on an 8MB line. The network communication between the two is almost instant, I am able to remotely connect fine with no lag whatsoever and am hosting several websites running MSSQL and all the queries run instantly, whether connected remotely or locally so it seems specific to PostgreSQL. I'm running their newest version of the software and the newest compatible version of PostgreSQL with their software. The database is a new database, containing hardly any data and I've ran vacuum/analyze etc all to no avail, I see no improvements. I don't understand how MSSQL can query almost instantly yet PostgreSQL struggles so much. I am able to telnet to the post 5432 on the VPS IP with no problems, and as I say the query does execute it just takes an extremely long time. What I do notice is on the router when the query is running that hardly any bandwidth is being used - but then again I wouldn't expect it to for a simple query but am not sure if this is the issue. I've tried connecting remotely on 3 different networks now (including different routers) but the problem remains. Connecting remotely via another machine via the LAN is instant. I have also edited the postgre conf file to allow for more memory/buffers etc but I don't think this is the problem - what I am asking it to do is very simple - it shouldn't be intensive at all. Thanks, Ricky

Read the article
Can MySQL reasonably perform queries on billions of rows?

- by haxney

I am planning on storing scans from a mass spectrometer in a MySQL database and would like to know whether storing and analyzing this amount of data is remotely feasible. I know performance varies wildly depending on the environment, but I'm looking for the rough order of magnitude: will queries take 5 days or 5 milliseconds? Input format Each input file contains a single run of the spectrometer; each run is comprised of a set of scans, and each scan has an ordered array of datapoints. There is a bit of metadata, but the majority of the file is comprised of arrays 32- or 64-bit ints or floats. Host system |----------------+-------------------------------| | OS | Windows 2008 64-bit | | MySQL version | 5.5.24 (x86_64) | | CPU | 2x Xeon E5420 (8 cores total) | | RAM | 8GB | | SSD filesystem | 500 GiB | | HDD RAID | 12 TiB | |----------------+-------------------------------| There are some other services running on the server using negligible processor time. File statistics |------------------+--------------| | number of files | ~16,000 | | total size | 1.3 TiB | | min size | 0 bytes | | max size | 12 GiB | | mean | 800 MiB | | median | 500 MiB | | total datapoints | ~200 billion | |------------------+--------------| The total number of datapoints is a very rough estimate. Proposed schema I'm planning on doing things "right" (i.e. normalizing the data like crazy) and so would have a runs table, a spectra table with a foreign key to runs, and a datapoints table with a foreign key to spectra. The 200 Billion datapoint question I am going to be analyzing across multiple spectra and possibly even multiple runs, resulting in queries which could touch millions of rows. Assuming I index everything properly (which is a topic for another question) and am not trying to shuffle hundreds of MiB across the network, is it remotely plausible for MySQL to handle this? UPDATE: additional info The scan data will be coming from files in the XML-based mzML format. The meat of this format is in the <binaryDataArrayList> elements where the data is stored. Each scan produces = 2 <binaryDataArray> elements which, taken together, form a 2-dimensional (or more) array of the form [[123.456, 234.567, ...], ...]. These data are write-once, so update performance and transaction safety are not concerns. My naïve plan for a database schema is: runs table | column name | type | |-------------+-------------| | id | PRIMARY KEY | | start_time | TIMESTAMP | | name | VARCHAR | |-------------+-------------| spectra table | column name | type | |----------------+-------------| | id | PRIMARY KEY | | name | VARCHAR | | index | INT | | spectrum_type | INT | | representation | INT | | run_id | FOREIGN KEY | |----------------+-------------| datapoints table | column name | type | |-------------+-------------| | id | PRIMARY KEY | | spectrum_id | FOREIGN KEY | | mz | DOUBLE | | num_counts | DOUBLE | | index | INT | |-------------+-------------| Is this reasonable?

Read the article

Search Results

Search found 1631 results on 66 pages for 'statistics'.

Page 58/66 | < Previous Page | 54 55 56 57 58 59 60 61 62 63 64 65 | Next Page >

- by BuckWoody

- by Alois Kraus

- by David Dorf

- by Hassan Al-Jeshi

- by PHP_Jedi

- by M.A.X

- by Hristo

- by Hristo

- by dan04

- by Hristo

- by chunhui

- by John A. Ramey

- by zxcvbnm

- by demetriusalwyn

- by chiragshahkapadia

- by Janning

- by Sree Aurovindh

- by Joe Majewski

- by Nazgulled

- by jlafay

- by Jen

- by Muffinbubble

- by haxney

< Previous Page | 54 55 56 57 58 59 60 61 62 63 64 65 | Next Page >