Search Results

Search found 60391 results on 2416 pages for 'data generation'.

Page 441/2416 | < Previous Page | 437 438 439 440 441 442 443 444 445 446 447 448 | Next Page >

Somebody is storing credit card data - how are they doing it?

- by pygorex1

Storing credit card information securely and legally is very difficult and should not be attempted. I have no intention of storing credit card data but I'm dying to figure out the following: My credit card info is being stored on a server some where in he tworld. This data is (hopefully) not being stored on a merchant's server, but at some point it needs to be stored to verify and charge the account identified by merchant submitted data. My question is this: if you were tasked with storing credit card data what encryption strategy would you use to secure the data on-disk? From what I can tell submitted credit card info is being checked more or less in real time. I doubt that any encryption key used to secure the data is being entered manually, so decryption is being done on the fly, which implies that the keys themselves are being stored on-disk. How would you secure your data and your keys in an automated system like this?

Read the article
NoSQL Java API for MySQL Cluster: Questions & Answers

- by Mat Keep

The MySQL Cluster engineering team recently ran a live webinar, available now on-demand demonstrating the ClusterJ and ClusterJPA NoSQL APIs for MySQL Cluster, and how these can be used in building real-time, high scale Java-based services that require continuous availability. Attendees asked a number of great questions during the webinar, and I thought it would be useful to share those here, so others are also able to learn more about the Java NoSQL APIs. First, a little bit about why we developed these APIs and why they are interesting to Java developers. ClusterJ and Cluster JPA ClusterJ is a Java interface to MySQL Cluster that provides either a static or dynamic domain object model, similar to the data model used by JDO, JPA, and Hibernate. A simple API gives users extremely high performance for common operations: insert, delete, update, and query. ClusterJPA works with ClusterJ to extend functionality, including - Persistent classes - Relationships - Joins in queries - Lazy loading - Table and index creation from object model By eliminating data transformations via SQL, users get lower data access latency and higher throughput. In addition, Java developers have a more natural programming method to directly manage their data, with a complete, feature-rich solution for Object/Relational Mapping. As a result, the development of Java applications is simplified with faster development cycles resulting in accelerated time to market for new services. MySQL Cluster offers multiple NoSQL APIs alongside Java: - Memcached for a persistent, high performance, write-scalable Key/Value store, - HTTP/REST via an Apache module - C++ via the NDB API for the lowest absolute latency. Developers can use SQL as well as NoSQL APIs for access to the same data set via multiple query patterns – from simple Primary Key lookups or inserts to complex cross-shard JOINs using Adaptive Query Localization Marrying NoSQL and SQL access to an ACID-compliant database offers developers a number of benefits. MySQL Cluster’s distributed, shared-nothing architecture with auto-sharding and real time performance makes it a great fit for workloads requiring high volume OLTP. Users also get the added flexibility of being able to run real-time analytics across the same OLTP data set for real-time business insight. OK – hopefully you now have a better idea of why ClusterJ and JPA are available. Now, for the Q&A. Q & A Q. Why would I use Connector/J vs. ClusterJ? A. Partly it's a question of whether you prefer to work with SQL (Connector/J) or objects (ClusterJ). Performance of ClusterJ will be better as there is no need to pass through the MySQL Server. A ClusterJ operation can only act on a single table (e.g. no joins) - ClusterJPA extends that capability Q. Can I mix different APIs (ie ClusterJ, Connector/J) in our application for different query types? A. Yes. You can mix and match all of the API types, SQL, JDBC, ODBC, ClusterJ, Memcached, REST, C++. They all access the exact same data in the data nodes. Update through one API and new data is instantly visible to all of the others. Q. How many TCP connections would a SessionFactory instance create for a cluster of 8 data nodes? A. SessionFactory has a connection to the mgmd (management node) but otherwise is just a vehicle to create Sessions. Without using connection pooling, a SessionFactory will have one connection open with each data node. Using optional connection pooling allows multiple connections from the SessionFactory to increase throughput. Q. Can you give details of how Cluster J optimizes sharding to enhance performance of distributed query processing? A. Each data node in a cluster runs a Transaction Coordinator (TC), which begins and ends the transaction, but also serves as a resource to operate on the result rows. While an API node (such as a ClusterJ process) can send queries to any TC/data node, there are performance gains if the TC is where most of the result data is stored. ClusterJ computes the shard (partition) key to choose the data node where the row resides as the TC. Q. What happens if we perform two primary key lookups within the same transaction? Are they sent to the data node in one transaction? A. ClusterJ will send identical PK lookups to the same data node. Q. How is distributed query processing handled by MySQL Cluster ? A. If the data is split between data nodes then all of the information will be transparently combined and passed back to the application. The session will connect to a data node - typically by hashing the primary key - which then interacts with its neighboring nodes to collect the data needed to fulfil the query. Q. Can I use Foreign Keys with MySQL Cluster A. Support for Foreign Keys is included in the MySQL Cluster 7.3 Early Access release Summary The NoSQL Java APIs are packaged with MySQL Cluster, available for download here so feel free to take them for a spin today! Key Resources MySQL Cluster on-line demo MySQL ClusterJ and JPA On-demand webinar MySQL ClusterJ and JPA documentation MySQL ClusterJ and JPA whitepaper and tutorial

Read the article
Why should I prepend my custom attributes with "data-"?

- by Horace Loeb

So any custom data attribute that I use should start with "data-": <li class="user" data-name="John Resig" data-city="Boston" data-lang="js" data-food="Bacon"> John says: Hello, how are you? </li> Will anything bad happen if I just ignore this? I.e.: <li class="user" name="John Resig" city="Boston" lang="js" food="Bacon"> John says: Hello, how are you? </li> I guess one bad thing is that my custom attributes could conflict with HTML attributes with special meanings (e.g., name), but aside from this, is there a problem with just writing "example_text" instead of "data-example_text"? (It won't validate, but who cares?)

Read the article
Need a way for users to enter data while offline and re-submit it when back online

- by crankharder

As part of a larger webapp, I want to build functionality that allows a user to enter data while offline -- and then send that data back to my site when they have a connection again The parts that, to me, are missing ar Saving a certain set of data in their browser Saving a form that allows them to enter data using form from step#2 to update data from step#1 getting data out of the local data store and sending it back to the server I would like to keep this entirely within the browser, so... Does HTML5 meet some (or all) of those goals as it's currently implemented in webkit/ff3? If not,what technologies should I start looking into in order to accomplish all of the above.

Read the article
How can I allow users to switch data sources for an SSRS report?

- by fatcat1111

I have two SQL Server databases with identical schemas, but different data. I also have SSRS generating reports, in native mode, for one of the databases. All reports the same shared data source. I would like to allow users to get reports for the other database. I created a second shared data source for the second database. Modifying the reports to use this second data source results in reports as expected. Because the RDLs are the same, except for the data source, and because I don't want to maintain what are basically duplicate reports, I'm looking for a way to dynamically switch data sources, depending on user input. Is there an easy means of accomplishing this? An existing solution would be best. Barring that, can the RDL's data source be parametrized? Or, can the RDS's connection string be parametrized?

Read the article
Socket read() hangs for a while when there is no data to read.

- by janesconference

Hi' I'm writing a simple http port forwarder. I read data from port 80, and pass the data to my lighttpd server, on port 8080. As long as I write() data on the socket on port 8080 (forwarding the request) there's no problem, but when I read() data from that socket (forwarding the response), the last read() hangs a lot (about 1 or 2 seconds) before realizing there's no more data and returning 0. I tried to set the socket to non-blocking, but this doesn't work, as sometimes it returns EWOULDBLOCKING even if there's some data left (lighttpd + cgi can be quite slow). I tried to set a timeout with select(), but, as above, a slow cgi could timeout the socket when there's actually some data to transmit. How would you do?

Read the article
How can i supply an AntiForgeryToken when posting JSON data using $.ajax ?

- by HerbalMart

I am using the code as below of this post: First i will an fill array variable with the correct values for the controller action. Using the code below i think it should be very straigtforward by just adding the following line to the javascript: data["__RequestVerificationToken"] = $('[name=__RequestVerificationToken]').val(); The <%= Html.AntiForgeryToken() %> is at his right place and the action has a [ValidateAntiForgeryToken] But my controller action keeps saying: "Invalid forgery token" What am i doing wrong here? Code data["fiscalyear"] = fiscalyear; data["subgeography"] = $(list).parent().find('input[name=subGeography]').val(); data["territories"] = new Array(); $(items).each(function() { data["territories"].push($(this).find('input[name=territory]').val()); }); if (url != null) { $.ajax( { dataType: 'JSON', contentType: 'application/json; charset=utf-8', url: url, type: 'POST', context: document.body, data: JSON.stringify(data), success: function() { refresh(); } }); }

Read the article
ADF Business Components

- by Arda Eralp

ADF Business Components and JDeveloper simplify the development, delivery, and customization of business applications for the Java EE platform. With ADF Business Components, developers aren't required to write the application infrastructure code required by the typical Java EE application to: Connect to the database Retrieve data Lock database records Manage transactions ADF Business Components addresses these tasks through its library of reusable software components and through the supporting design time facilities in JDeveloper. Most importantly, developers save time using ADF Business Components since the JDeveloper design time makes typical development tasks entirely declarative. In particular, JDeveloper supports declarative development with ADF Business Components to: Author and test business logic in components which automatically integrate with databases Reuse business logic through multiple SQL-based views of data, supporting different application tasks Access and update the views from browser, desktop, mobile, and web service clients Customize application functionality in layers without requiring modification of the delivered application The goal of ADF Business Components is to make the business services developer more productive. ADF Business Components provides a foundation of Java classes that allow your business-tier application components to leverage the functionality provided in the following areas: Simplifying Data Access Design a data model for client displays, including only necessary data Include master-detail hierarchies of any complexity as part of the data model Implement end-user Query-by-Example data filtering without code Automatically coordinate data model changes with business services layer Automatically validate and save any changes to the database Enforcing Business Domain Validation and Business Logic Declaratively enforce required fields, primary key uniqueness, data precision-scale, and foreign key references Easily capture and enforce both simple and complex business rules, programmatically or declaratively, with multilevel validation support Navigate relationships between business domain objects and enforce constraints related to compound components Supporting Sophisticated UIs with Multipage Units of Work Automatically reflect changes made by business service application logic in the user interface Retrieve reference information from related tables, and automatically maintain the information when the user changes foreign-key values Simplify multistep web-based business transactions with automatic web-tier state management Handle images, video, sound, and documents without having to use code Synchronize pending data changes across multiple views of data Consistently apply prompts, tooltips, format masks, and error messages in any application Define custom metadata for any business components to support metadata-driven user interface or application functionality Add dynamic attributes at runtime to simplify per-row state management Implementing High-Performance Service-Oriented Architecture Support highly functional web service interfaces for business integration without writing code Enforce best-practice interface-based programming style Simplify application security with automatic JAAS integration and audit maintenance "Write once, run anywhere": use the same business service as plain Java class, EJB session bean, or web service Streamlining Application Customization Extend component functionality after delivery without modifying source code Globally substitute delivered components with extended ones without modifying the application ADF Business Components implements the business service through the following set of cooperating components: Entity object An entity object represents a row in a database table and simplifies modifying its data by handling all data manipulation language (DML) operations for you. These are basically your 1 to 1 representation of a database table. Each table in the database will have 1 and only 1 EO. The EO contains the mapping between columns and attributes. EO's also contain the business logic and validation. These are you core data services. They are responsible for updating, inserting and deleting records. The Attributes tab displays the actual mapping between attributes and columns, the mapping has following fields: Name : contains the name of the attribute we expose in our data model. Type : defines the data type of the attribute in our application. Column : specifies the column to which we want to map the attribute with Column Type : contains the type of the column in the database View object A view object represents a SQL query. You use the full power of the familiar SQL language to join, filter, sort, and aggregate data into exactly the shape required by the end-user task. The attributes in the View Objects are actually coming from the Entity Object. In the end the VO will generate a query but you basically build a VO by selecting which EO need to participate in the VO and which attributes of those EO you want to use. That's why you have the Entity Usage column so you can see the relation between VO and EO. In the query tab you can clearly see the query that will be generated for the VO. At this stage we don't need it and just use it for information purpose. In later stages we might use it. Application module An application module is the controller of your data layer. It is responsible for keeping hold of the transaction. It exposes the data model to the view layer. You expose the VO's through the Application Module. This is the abstraction of your data layer which you want to show to the outside word.It defines an updatable data model and top-level procedures and functions (called service methods) related to a logical unit of work related to an end-user task. While the base components handle all the common cases through built-in behavior, customization is always possible and the default behavior provided by the base components can be easily overridden or augmented. When you create EO's, a foreign key will be translated into an association in our model. It defines the type of relation and who is the master and child as well as how the visibility of the association looks like. A similar concept exists to identify relations between view objects. These are called view links. These are almost identical as association except that a view link is based upon attributes defined in the view object. It can also be based upon an association. Here's a short summary: Entity Objects: representations of tables Association: Relations between EO's. Representations of foreign keys View Objects: Logical model View Links: Relationships between view objects Application Model: interface to your application

Read the article
Best practice -- Content Tracking Remote Data (cURL, file_get_contents, cron, et. al)?

- by user322787

I am attempting to build a script that will log data that changes every 1 second. The initial thought was "Just run a php file that does a cURL every second from cron" -- but I have a very strong feeling that this isn't the right way to go about it. Here are my specifications: There are currently 10 sites I need to gather data from and log to a database -- this number will invariably increase over time, so the solution needs to be scalable. Each site has data that it spits out to a URL every second, but only keeps 10 lines on the page, and they can sometimes spit out up to 10 lines each time, so I need to pick up that data every second to ensure I get all the data. As I will also be writing this data to my own DB, there's going to be I/O every second of every day for a considerably long time. Barring magic, what is the most efficient way to achieve this? it might help to know that the data that I am getting every second is very small, under 500bytes.

Read the article
how read data from file and execute to MYSQL?

- by Mahran Elneel

i create form to load sql file and used fopen function top open file and read this but when want to execute this data to database not work? what is wrong in my code????????????? $ofile = trim(basename($_FILES['sqlfile']['name'])); $path = "sqlfiles/".$ofile; //$data = settype($data,"string"); $file = ""; $connect = mysql_connect('localhost','root',''); $selectdb = mysql_select_db('files'); if(isset($_POST['submit'])) { if(!move_uploaded_file($_FILES['sqlfile']['tmp_name'],"sqlfiles/".$ofile)) { $path = ""; } $file = fopen("sqlfiles/".$ofile,"r") or exit("error open file!"); while (!feof($file)) { $data = fgetc($file); settype($data,"string"); $rslt = mysql_query($data); print $data; } fclose($file); }

Read the article
How can I access the row index numbers on a data frame in R?

- by user123276

I have a data frame that was sampled from another data frame. As a result, when I print the output of the data frame, I get a jumble of numbers on the left hand side of the data frame. The original data frame was nicely numbered from 1,2,3,4,5, and so on. But my new data frame is numbered 5,15,3,65, etc on the left hand side. Is there a way I can access the row index information for a data frame in R? thank you!

Read the article
[Python] Best strategy for dealing with incomplete lines of data from a file.

- by adoran

I use the following block of code to read lines out of a file 'f' into a nested list: for data in f: clean_data = data.rstrip() data = clean_data.split('\t') t += [data[0]] strmat += [data[1:]] Sometimes, however, the data is incomplete and a row may look like this: ['955.159', '62.8168', '', '', '', '', '', '', '', '', '', '', '', '', '', '29', '30', '0', '0'] It puts a spanner in the works because I would like Python to implicitly cast my list as floats but the empty fields '' cause it to be cast as an array of strings (dtype: s12). I could start a second 'if' statement and convert all empty fields into NULL (since 0 is wrong in this instance) but I was unsure whether this was best. Is this the best strategy of dealing with incomplete data? Should I edit the stream or do it post-hoc?

Read the article
why does the data property in an jquery ajax call override my return false?

- by user315709

hi, i have the following block of code: $("#contact_container form, #contact_details form").live( "submit", function(event) { $.ajax({ type: this.method, url: this.action, data: this.serialize(), success: function(data) { data = $(data).find("#content"); $("#contact_details").html(data); }, }); return false; } ; when i leave out the data: this.serialize(), it behaves properly and displays the response within the #contact_details div. however, when i leave it in, it submits the form, causing the page to navigate away. why does the presence of the data attribute negates the return false? (probably due to a bug that i can't spot...) also, is the syntax to my find statement correct? it comes back as "undefined" even though i use a debugger to check the ajax response and that id does exists. thanks, steve

Read the article
Unable to Mange DNS via MMC

- by IT Helpdesk Team Manager

When trying to access the DNS service on Microsoft Windows Server 2003 (Build 3790) domain controller/schema master via the MMC DNS snap in or locally via the DNS MMC from Administrative tools I'm getting a red "X" through the icon for the DNS Server. The inability to access DNS management via MMC happens on all domain controllers as well. We've looked at items such as the DHCP client not being started, incorrect DNS setup ( the machine points at itself and another DC ), the DNS service not running ( it is and all DNS queries via NSLOOKUP work correctly ), dslint returns the correct information and functions as expected. There is the following entry in the DNS event log: The DNS server could not initialize the remote procedure call (RPC) service. If it is not running, start the RPC service or reboot the computer. The event data is the error code. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp. 0000: 0000051b dnscmd fails with RPC server unavailable yet RPC is started: C:\Documents and Settings\Administrator.DOMAIN>dnscmd /Info Info query failed status = 1722 (0x000006ba) Command failed: RPC_S_SERVER_UNAVAILABLE 1722 (000006ba) DCDIAG /TEST:DNS /V /E produces the following errors: Warning: no DNS RPC connectivity (error or non Microsoft DNS server is running) [Error details: 1753 (Type: Win32 - Description: There are no more endpoints available from the endpoint mapper.)] Warning: no DNS RPC connectivity (error or non Microsoft DNS server is running) [Error details: 1722 (Type: Win32 - Description: The RPC server is unavailable.)] The DNS server could not initialize the remote procedure call (RPC) service. If it is not running, start the RPC service or reboot the computer. The event data is the error code. A DNS query for _ldap._tcp.dc._msdcs. returns the correct results. All domain and ADS related activities are working except that I can't manage my DNS via MMC or dnscmd. Any thoughts or solutions would be greatly appreciated. EDIT: Adding Registry export per request: Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Rpc Class Name: <NO CLASS> Last Write Time: 10/18/2012 - 2:29 PM Value 0 Name: DCOM Protocols Type: REG_MULTI_SZ Data: ncacn_ip_tcp Value 1 Name: UuidSequenceNumber Type: REG_DWORD Data: 0xb19bd0f Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Rpc\ClientProtocols Class Name: <NO CLASS> Last Write Time: 3/9/2007 - 12:11 PM Value 0 Name: ncacn_np Type: REG_SZ Data: rpcrt4.dll Value 1 Name: ncacn_ip_tcp Type: REG_SZ Data: rpcrt4.dll Value 2 Name: ncadg_ip_udp Type: REG_SZ Data: rpcrt4.dll Value 3 Name: ncacn_http Type: REG_SZ Data: rpcrt4.dll Value 4 Name: ncacn_at_dsp Type: REG_SZ Data: rpcrt4.dll Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Rpc\NameService Class Name: <NO CLASS> Last Write Time: 2/20/2006 - 4:48 PM Value 0 Name: DefaultSyntax Type: REG_SZ Data: 3 Value 1 Name: Endpoint Type: REG_SZ Data: \pipe\locator Value 2 Name: NetworkAddress Type: REG_SZ Data: \\. Value 3 Name: Protocol Type: REG_SZ Data: ncacn_np Value 4 Name: ServerNetworkAddress Type: REG_SZ Data: \\. Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Rpc\NetBios Class Name: <NO CLASS> Last Write Time: 2/20/2006 - 4:48 PM Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Rpc\RpcProxy Class Name: <NO CLASS> Last Write Time: 3/9/2007 - 12:11 PM Value 0 Name: Enabled Type: REG_DWORD Data: 0x1 Value 1 Name: ValidPorts Type: REG_SZ Data: pdc:100-5000 Key Name: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Rpc\SecurityService Class Name: <NO CLASS> Last Write Time: 2/20/2006 - 4:48 PM Value 0 Name: 9 Type: REG_SZ Data: secur32.dll Value 1 Name: 10 Type: REG_SZ Data: secur32.dll Value 2 Name: 14 Type: REG_SZ Data: schannel.dll Value 3 Name: 16 Type: REG_SZ Data: secur32.dll Value 4 Name: 1 Type: REG_SZ Data: secur32.dll Value 5 Name: 18 Type: REG_SZ Data: secur32.dll Value 6 Name: 68 Type: REG_SZ Data: netlogon.dll

Read the article
Minimal-change algorithm which maximises 'swapping'

- by Kim Bastin

This is a question on combinatorics from a non-mathematician, so please try to bear with me! Given an array of n distinct characters, I want to generate subsets of k characters in a minimal-change order, i.e. an order in which generation n+1 contains exactly one character that was not in generation n. That's not too hard in itself. However, I also want to maximise the number of cases in which the character that is swapped out in generation n+1 is the same character that was swapped in in generation n. To illustrate, for n=7, k=3: abc abd abe* abf* abg* afg aeg* adg* acg* acd ace* acf* aef adf* ade bde bdf bef bcf* bce bcd* bcg* bdg beg* bfg* cfg ceg* cdg* cde cdf* cef def deg dfg efg The asterisked strings indicate the case I want to maximise; e.g. the e that is new in generation 3, abe, replaces a d that was new in generation 2, abd. It doesn't seem possible to have this happen in every generation, but I want it to happen as often as possible. Typical array sizes that I use are 20-30 and subset sizes around 5-8. I'm using an odd language, Icon (or actually its derivative Unicon), so I don't expect anyone to post code that I can used directly. But I will be grateful for answers or hints in pseudo-code, and will do my best to translate C etc. Also, I have noticed that problems of this kind are often discussed in terms of arrays of integers, and I can certainly apply solutions posted in such terms to my own problem. Thanks Kim Bastin

Read the article
Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article
Use pt-table-sync to setup a new MySQL DB

- by Generation D Systems

I have 2 hosts (A and B). B contains a MySQL server with a database called mydb, and A contains a MySQL server with nothing (fresh install). I want to replicate the entire mydb from B to A, by running a script on A (I do not have shell access to B). Can I run this on A: pt-table-sync --execute h=b.mydomain.com,D=mydb h=a.mydomain.com I've read the docs but don't get a 100% comfort feeling (perhaps because of all the warnings about damaging your data if you don't know what you're doing). Will this work? as well, is h=a.mydomin.com necessary? (Will it route all traffic back in/out the local NIC?) can I use localhost or nothing at all?

Read the article
Puppet: how to use data from a MySQL table in Puppet 3.0 templates?

- by Luke404

I have some data whose source-of-truth is in a MySQL database, size is expected to max out at the some-thousands-rows range (in a worst-case scenario) and I'd like to use puppet to configure files on some servers with that data (mostly iterating through those rows in a template). I'm currently using Puppet 3.0.x, and I cannot change the fact that MySQL will be the authoritative source for that data. Please note, data comes from external sources and not from puppet or from the managed nodes. What possible approaches are there? Which one would you recommend? Would External Node Classifiers be useful here? My "last resort" would be regularly dumping the table to a YAML file and reading that through Hiera to a Puppet template, or to directly dump the table in one or more pre-formatted text file(s) ready to be copied to the nodes. There is an unanswered question on SF about system users but the fundamental issue is probably similar to mine - he's trying to get data out of MySQL.

Read the article
How can I compare Excel serial dates WITHOUT converting to mm/dd/yy type dates?

- by dwwilson66

I have a table that contains a number of values representing Excel serial dates. After a number of unsuccessful attempts to compare fields, my current approach is to do comparisons between serial dates instead of calendar dates. I am trying to summarize the data--by DAY--with formulae. CONSIDER: 41021 some data 41021.625 some data 41021.63542 some data 41022 some data 41022.26042 some data 41022.91667 some data 41023 some data 41023.375 some data DESIRED RESULT: 41021 sum of 41021, 41021.625 and 41021.63542 data 41022 sum of 41022, 41022.26042 and 41022.91667 data 41023 sum of 41023 and 41023.375 data In essence, for all instances of SerialDate.SerialTime, SUM data values associated with SerialDate.* regardless of the *.SerialTime for that date. While I can see how to do this by creating additional dates column formatted as =TEXT(<DateField>,"mm/dd/yyyy") I'm looking for a solution that will allow me to handle this 'conversion' in the formula, e.g.SUMIF((TEXT(<dateRange>,"yy/mm/dd"),=(TEXT(<dateField,"yy/mm/dd")),<dataRange> Make sense? Anyone have any ideas? Thanks

Read the article
Is it faster to create indexes before or after data loading in MySQL?

- by Josh Glover

I have a data replication process that drops and recreates a few tables in a target database, then loads them up with data from a source database (running on another host, but that is immaterial to the question at hand). The target database does need primary keys and a few other indexes on its tables, but not during the data loading. I'm currently loading all of the data, then creating the indexes. However, index creation takes a pretty long time--30 minutes of my data loader's 5 and a half hour running time. My intuition tells me that creating the indexes at the end should be faster than creating them first, since the index would need to be rewritten with each insert. Can anyone tell me for sure which way is faster? FWIW, I'm running MySQL 5.1 with InnoDB tables.

Read the article
Is it useful check data integrity in one DAT tape?

- by maxim

I backup my data every day on tape using one drive DAT HP Storageworks DAT 160. I use one tape for every day and I turn them weekly. Every monday I check one tape randomly recover some files saved on it. I know that when data is saved on tape, the driver and backup software check data integrity, but I wonder if a manual check of some data saved has a sense or not. I re-use these tapes many times and I would be sure data are safe.

Read the article
SQL SERVER – Weekly Series – Memory Lane – #039

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 FQL – Facebook Query Language Facebook list following advantages of FQL: Condensed XML reduces bandwidth and parsing costs. More complex requests can reduce the number of requests necessary. Provides a single consistent, unified interface for all of your data. It’s fun! UDF – Get the Day of the Week Function The day of the week can be retrieved in SQL Server by using the DatePart function. The value returned by the function is between 1 (Sunday) and 7 (Saturday). To convert this to a string representing the day of the week, use a CASE statement. UDF – Function to Get Previous And Next Work Day – Exclude Saturday and Sunday While reading ColdFusion blog of Ben Nadel Getting the Previous Day In ColdFusion, Excluding Saturday And Sunday, I realize that I use similar function on my SQL Server Database. This function excludes the Weekends (Saturday and Sunday), and it gets previous as well as next work day. Complete Series of SQL Server Interview Questions and Answers Data Warehousing Interview Questions and Answers – Introduction Data Warehousing Interview Questions and Answers – Part 1 Data Warehousing Interview Questions and Answers – Part 2 Data Warehousing Interview Questions and Answers – Part 3 Data Warehousing Interview Questions and Answers Complete List Download 2008 Introduction to Log Viewer In SQL Server all the windows event logs can be seen along with SQL Server logs. Interface for all the logs is same and can be launched from the same place. This log can be exported and filtered as well. DBCC SHRINKFILE Takes Long Time to Run If you are DBA who are involved with Database Maintenance and file group maintenance, you must have experience that many times DBCC SHRINKFILE operations takes a long time but any other operations with Database are relatively quicker. mssqlsystemresource – Resource Database The purpose of resource database is to facilitates upgrading to the new version of SQL Server without any hassle. In previous versions whenever version of SQL Server was upgraded all the previous version system objects needs to be dropped and new version system objects to be created. 2009 Puzzle – Write Script to Generate Primary Key and Foreign Key In SQL Server Management Studio (SSMS), there is no option to script all the keys. If one is required to script keys they will have to manually script each key one at a time. If database has many tables, generating one key at a time can be a very intricate task. I want to throw a question to all of you if any of you have scripts for the same purpose. Maximizing View of SQL Server Management Studio – Full Screen – New Screen I had explained the following two different methods: 1) Open Results in Separate Tab - This is a very interesting method as result pan shows up in a different tab instead of the splitting screen horizontally. 2) Open SSMS in Full Screen - This works always and to its best. Not many people are aware of this method; hence, very few people use it to enhance performance. 2010 Find Queries using Parallelism from Cached Plan T-SQL script gets all the queries and their execution plan where parallelism operations are kicked up. Pay attention there is TOP 10 is used, if you have lots of transactional operations, I suggest that you change TOP 10 to TOP 50 This is the list of the all the articles in the series of computed columns. SQL SERVER – Computed Column – PERSISTED and Storage This article talks about how computed columns are created and why they take more storage space than before. SQL SERVER – Computed Column – PERSISTED and Performance This article talks about how PERSISTED columns give better performance than non-persisted columns. SQL SERVER – Computed Column – PERSISTED and Performance – Part 2 This article talks about how non-persisted columns give better performance than PERSISTED columns. SQL SERVER – Computed Column and Performance – Part 3 This article talks about how Index improves the performance of Computed Columns. SQL SERVER – Computed Column – PERSISTED and Storage – Part 2 This article talks about how creating index on computed column does not grow the row length of table. SQL SERVER – Computed Columns – Index and Performance This article summarized all the articles related to computed columns. 2011 SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 21 of 31 What is Data Warehousing? What is Business Intelligence (BI)? What is a Dimension Table? What is Dimensional Modeling? What is a Fact Table? What are the Fundamental Stages of Data Warehousing? What are the Different Methods of Loading Dimension tables? Describes the Foreign Key Columns in Fact Table and Dimension Table? What is Data Mining? What is the Difference between a View and a Materialized View? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 22 of 31 What is OLTP? What is OLAP? What is the Difference between OLTP and OLAP? What is ODS? What is ER Diagram? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 23 of 31 What is ETL? What is VLDB? Is OLTP Database is Design Optimal for Data Warehouse? If denormalizing improves Data Warehouse Processes, then why is the Fact Table is in the Normal Form? What are Lookup Tables? What are Aggregate Tables? What is Real-Time Data-Warehousing? What are Conformed Dimensions? What is a Conformed Fact? How do you Load the Time Dimension? What is a Level of Granularity of a Fact Table? What are Non-Additive Facts? What is a Factless Facts Table? What are Slowly Changing Dimensions (SCD)? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 24 of 31 What is Hybrid Slowly Changing Dimension? What is BUS Schema? What is a Star Schema? What Snow Flake Schema? Differences between the Star and Snowflake Schema? What is Difference between ER Modeling and Dimensional Modeling? What is Degenerate Dimension Table? Why is Data Modeling Important? What is a Surrogate Key? What is Junk Dimension? What is a Data Mart? What is the Difference between OLAP and Data Warehouse? What is a Cube and Linked Cube with Reference to Data Warehouse? What is Snapshot with Reference to Data Warehouse? What is Active Data Warehousing? What is the Difference between Data Warehousing and Business Intelligence? What is MDS? Explain the Paradigm of Bill Inmon and Ralph Kimball. SQL SERVER – Azure Interview Questions and Answers – Guest Post by Paras Doshi – Day 25 of 31 Paras Doshi has submitted 21 interesting question and answers for SQL Azure. 1.What is SQL Azure? 2.What is cloud computing? 3.How is SQL Azure different than SQL server? 4.How many replicas are maintained for each SQL Azure database? 5.How can we migrate from SQL server to SQL Azure? 6.Which tools are available to manage SQL Azure databases and servers? 7.Tell me something about security and SQL Azure. 8.What is SQL Azure Firewall? 9.What is the difference between web edition and business edition? 10.How do we synchronize On Premise SQL server with SQL Azure? 11.How do we Backup SQL Azure Data? 12.What is the current pricing model of SQL Azure? 13.What is the current limitation of the size of SQL Azure DB? 14.How do you handle datasets larger than 50 GB? 15.What happens when the SQL Azure database reaches Max Size? 16.How many databases can we create in a single server? 17.How many servers can we create in a single subscription? 18.How do you improve the performance of a SQL Azure Database? 19.What is code near application topology? 20.What were the latest updates to SQL Azure service? 21.When does a workload on SQL Azure get throttled? SQL SERVER – Interview Questions and Answers – Guest Post by Malathi Mahadevan – Day 26 of 31 Malachi had asked a simple question which has several answers. Each answer makes you think and ponder about the reality of the IT world. Look at the simple question – ‘What is the toughest challenge you have faced in your present job and how did you handle it’? and its various answers. Each answer has its own story. SQL SERVER – Interview Questions and Answers – Guest Post by Rick Morelan – Day 27 of 31 Rick Morelan of Joes2Pros has written an excellent blog post on the subject how to find top N values. Most people are fully aware of how the TOP keyword works with a SELECT statement. After years preparing so many students to pass the SQL Certification I noticed they were pretty well prepared for job interviews too. Yes, they would do well in the interview but not great. There seemed to be a few questions that would come up repeatedly for almost everyone. Rick addresses similar questions in his lucid writing skills. 2012 Observation of Top with Index and Order of Resultset SQL Server has lots of things to learn and share. It is amazing to see how people evaluate and understand different techniques and styles differently when implementing. The real reason may be absolutely different but we may blame something totally different for the incorrect results. Read the blog post to learn more. How do I Record Video and Webcast How to Convert Hex to Decimal or INT Earlier I asked regarding a question about how to convert Hex to Decimal. I promised that I will post an answer with Due Credit to the author but never got around to post a blog post around it. Read the original post over here SQL SERVER – Question – How to Convert Hex to Decimal. Query to Get Unique Distinct Data Based on Condition – Eliminate Duplicate Data from Resultset The natural reaction will be to suggest DISTINCT or GROUP BY. However, not all the questions can be solved by DISTINCT or GROUP BY. Let us see the following example, where a user wanted only latest records to be displayed. Let us see the example to understand further. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Organizations & Architecture UNISA Studies – Chap 7

- by MarkPearl

Learning Outcomes Name different device categories Discuss the functions and structure of I/.O modules Describe the principles of Programmed I/O Describe the principles of Interrupt-driven I/O Describe the principles of DMA Discuss the evolution characteristic of I/O channels Describe different types of I/O interface Explain the principles of point-to-point and multipoint configurations Discuss the way in which a FireWire serial bus functions Discuss the principles of InfiniBand architecture External Devices An external device attaches to the computer by a link to an I/O module. The link is used to exchange control, status, and data between the I/O module and the external device. External devices can be classified into 3 categories… Human readable – e.g. video display Machine readable – e.g. magnetic disk Communications – e.g. wifi card I/O Modules An I/O module has two major functions… Interface to the processor and memory via the system bus or central switch Interface to one or more peripheral devices by tailored data links Module Functions The major functions or requirements for an I/O module fall into the following categories… Control and timing Processor communication Device communication Data buffering Error detection I/O function includes a control and timing requirement, to coordinate the flow of traffic between internal resources and external devices. Processor communication involves the following… Command decoding Data Status reporting Address recognition The I/O device must be able to perform device communication. This communication involves commands, status information, and data. An essential task of an I/O module is data buffering due to the relative slow speeds of most external devices. An I/O module is often responsible for error detection and for subsequently reporting errors to the processor. I/O Module Structure An I/O module functions to allow the processor to view a wide range of devices in a simple minded way. The I/O module may hide the details of timing, formats, and the electro mechanics of an external device so that the processor can function in terms of simple reads and write commands. An I/O channel/processor is an I/O module that takes on most of the detailed processing burden, presenting a high-level interface to the processor. There are 3 techniques are possible for I/O operations Programmed I/O Interrupt[t I/O DMA Access Programmed I/O When a processor is executing a program and encounters an instruction relating to I/O it executes that instruction by issuing a command to the appropriate I/O module. With programmed I/O, the I/O module will perform the requested action and then set the appropriate bits in the I/O status register. The I/O module takes no further actions to alert the processor. I/O Commands To execute an I/O related instruction, the processor issues an address, specifying the particular I/O module and external device, and an I/O command. There are four types of I/O commands that an I/O module may receive when it is addressed by a processor… Control – used to activate a peripheral and tell it what to do Test – Used to test various status conditions associated with an I/O module and its peripherals Read – Causes the I/O module to obtain an item of data from the peripheral and place it in an internal buffer Write – Causes the I/O module to take an item of data form the data bus and subsequently transmit that data item to the peripheral The main disadvantage of this technique is it is a time consuming process that keeps the processor busy needlessly I/O Instructions With programmed I/O there is a close correspondence between the I/O related instructions that the processor fetches from memory and the I/O commands that the processor issues to an I/O module to execute the instructions. Typically there will be many I/O devices connected through I/O modules to the system – each device is given a unique identifier or address – when the processor issues an I/O command, the command contains the address of the address of the desired device, thus each I/O module must interpret the address lines to determine if the command is for itself. When the processor, main memory and I/O share a common bus, two modes of addressing are possible… Memory mapped I/O Isolated I/O (for a detailed explanation read page 245 of book) The advantage of memory mapped I/O over isolated I/O is that it has a large repertoire of instructions that can be used, allowing more efficient programming. The disadvantage of memory mapped I/O over isolated I/O is that valuable memory address space is sued up. Interrupts driven I/O Interrupt driven I/O works as follows… The processor issues an I/O command to a module and then goes on to do some other useful work The I/O module will then interrupts the processor to request service when is is ready to exchange data with the processor The processor then executes the data transfer and then resumes its former processing Interrupt Processing The occurrence of an interrupt triggers a number of events, both in the processor hardware and in software. When an I/O device completes an I/O operations the following sequence of hardware events occurs… The device issues an interrupt signal to the processor The processor finishes execution of the current instruction before responding to the interrupt The processor tests for an interrupt – determines that there is one – and sends an acknowledgement signal to the device that issues the interrupt. The acknowledgement allows the device to remove its interrupt signal The processor now needs to prepare to transfer control to the interrupt routine. To begin, it needs to save information needed to resume the current program at the point of interrupt. The minimum information required is the status of the processor and the location of the next instruction to be executed. The processor now loads the program counter with the entry location of the interrupt-handling program that will respond to this interrupt. It also saves the values of the process registers because the Interrupt operation may modify these The interrupt handler processes the interrupt – this includes examination of status information relating to the I/O operation or other event that caused an interrupt When interrupt processing is complete, the saved register values are retrieved from the stack and restored to the registers Finally, the PSW and program counter values from the stack are restored. Design Issues Two design issues arise in implementing interrupt I/O Because there will be multiple I/O modules, how does the processor determine which device issued the interrupt? If multiple interrupts have occurred, how does the processor decide which one to process? Addressing device recognition, 4 general categories of techniques are in common use… Multiple interrupt lines Software poll Daisy chain Bus arbitration For a detailed explanation of these approaches read page 250 of the textbook. Interrupt driven I/O while more efficient than simple programmed I/O still requires the active intervention of the processor to transfer data between memory and an I/O module, and any data transfer must traverse a path through the processor. Thus is suffers from two inherent drawbacks… The I/O transfer rate is limited by the speed with which the processor can test and service a device The processor is tied up in managing an I/O transfer; a number of instructions must be executed for each I/O transfer Direct Memory Access When large volumes of data are to be moved, an efficient technique is direct memory access (DMA) DMA Function DMA involves an additional module on the system bus. The DMA module is capable of mimicking the processor and taking over control of the system from the processor. It needs to do this to transfer data to and from memory over the system bus. DMA must the bus only when the processor does not need it, or it must force the processor to suspend operation temporarily (most common – referred to as cycle stealing). When the processor wishes to read or write a block of data, it issues a command to the DMA module by sending to the DMA module the following information… Whether a read or write is requested using the read or write control line between the processor and the DMA module The address of the I/O device involved, communicated on the data lines The starting location in memory to read from or write to, communicated on the data lines and stored by the DMA module in its address register The number of words to be read or written, communicated via the data lines and stored in the data count register The processor then continues with other work, it delegates the I/O operation to the DMA module which transfers the entire block of data, one word at a time, directly to or from memory without going through the processor. When the transfer is complete, the DMA module sends an interrupt signal to the processor, this the processor is involved only at the beginning and end of the transfer. I/O Channels and Processors Characteristics of I/O Channels As one proceeds along the evolutionary path, more and more of the I/O function is performed without CPU involvement. The I/O channel represents an extension of the DMA concept. An I/O channel ahs the ability to execute I/O instructions, which gives it complete control over I/O operations. In a computer system with such devices, the CPU does not execute I/O instructions – such instructions are stored in main memory to be executed by a special purpose processor in the I/O channel itself. Two types of I/O channels are common A selector channel controls multiple high-speed devices. A multiplexor channel can handle I/O with multiple characters as fast as possible to multiple devices. The external interface: FireWire and InfiniBand Types of Interfaces One major characteristic of the interface is whether it is serial or parallel parallel interface – there are multiple lines connecting the I/O module and the peripheral, and multiple bits are transferred simultaneously serial interface – there is only one line used to transmit data, and bits must be transmitted one at a time With new generation serial interfaces, parallel interfaces are becoming less common. In either case, the I/O module must engage in a dialogue with the peripheral. In general terms the dialog may look as follows… The I/O module sends a control signal requesting permission to send data The peripheral acknowledges the request The I/O module transfers data The peripheral acknowledges receipt of data For a detailed explanation of FireWire and InfiniBand technology read page 264 – 270 of the textbook

Read the article
How can I read data from a generated pointcloud, but from an other perspective of the camera? [migrated]

- by Vlad Lata

Basically what I'm trying to do is as follows: I have a software that generates and shows a pointcloud by analyzing Time-of-Flight data (Z-Data). This software has a GUI that delivers this pointcloud on a grid, and you can watch it and adjust the camera to change the perspective, or apply filtering to it and so on. Since the Z-data was recorder through a stereoscopic system, I want to obtain a perspective transformation. My idea was to simply change the position of the camera in the GUI and than add a button that sais (ex. New Perspective) that calls a function that would measure the distances from the existing pointcloud to the camera I'm viewing it from. Of course this would generate some occluded areas, but I want this to happen. And now the main question is: How can I do that? Are there any functions in OpenGL that measure the distance from an object to a camera, or is it even possible to do something like his? Or has someone some other idea? P.S. The software uses the qt sdk and opengl

Read the article
Spooling in SQL execution plans

- by Rob Farley

Sewing has never been my thing. I barely even know the terminology, and when discussing this with American friends, I even found out that half the words that Americans use are different to the words that English and Australian people use. That said – let’s talk about spools! In particular, the Spool operators that you find in some SQL execution plans. This post is for T-SQL Tuesday, hosted this month by me! I’ve chosen to write about spools because they seem to get a bad rap (even in my song I used the line “There’s spooling from a CTE, they’ve got recursion needlessly”). I figured it was worth covering some of what spools are about, and hopefully explain why they are remarkably necessary, and generally very useful. If you have a look at the Books Online page about Plan Operators, at http://msdn.microsoft.com/en-us/library/ms191158.aspx, and do a search for the word ‘spool’, you’ll notice it says there are 46 matches. 46! Yeah, that’s what I thought too... Spooling is mentioned in several operators: Eager Spool, Lazy Spool, Index Spool (sometimes called a Nonclustered Index Spool), Row Count Spool, Spool, Table Spool, and Window Spool (oh, and Cache, which is a special kind of spool for a single row, but as it isn’t used in SQL 2012, I won’t describe it any further here). Spool, Table Spool, Index Spool, Window Spool and Row Count Spool are all physical operators, whereas Eager Spool and Lazy Spool are logical operators, describing the way that the other spools work. For example, you might see a Table Spool which is either Eager or Lazy. A Window Spool can actually act as both, as I’ll mention in a moment. In sewing, cotton is put onto a spool to make it more useful. You might buy it in bulk on a cone, but if you’re going to be using a sewing machine, then you quite probably want to have it on a spool or bobbin, which allows it to be used in a more effective way. This is the picture that I want you to think about in relation to your data. I’m sure you use spools every time you use your sewing machine. I know I do. I can’t think of a time when I’ve got out my sewing machine to do some sewing and haven’t used a spool. However, I often run SQL queries that don’t use spools. You see, the data that is consumed by my query is typically in a useful state without a spool. It’s like I can just sew with my cotton despite it not being on a spool! Many of my favourite features in T-SQL do like to use spools though. This looks like a very similar query to before, but includes an OVER clause to return a column telling me the number of rows in my data set. I’ll describe what’s going on in a few paragraphs’ time. So what does a Spool operator actually do? The spool operator consumes a set of data, and stores it in a temporary structure, in the tempdb database. This structure is typically either a Table (ie, a heap), or an Index (ie, a b-tree). If no data is actually needed from it, then it could also be a Row Count spool, which only stores the number of rows that the spool operator consumes. A Window Spool is another option if the data being consumed is tightly linked to windows of data, such as when the ROWS/RANGE clause of the OVER clause is being used. You could maybe think about the type of spool being like whether the cotton is going onto a small bobbin to fit in the base of the sewing machine, or whether it’s a larger spool for the top. A Table or Index Spool is either Eager or Lazy in nature. Eager and Lazy are Logical operators, which talk more about the behaviour, rather than the physical operation. If I’m sewing, I can either be all enthusiastic and get all my cotton onto the spool before I start, or I can do it as I need it. “Lazy” might not the be the best word to describe a person – in the SQL world it describes the idea of either fetching all the rows to build up the whole spool when the operator is called (Eager), or populating the spool only as it’s needed (Lazy). Window Spools are both physical and logical. They’re eager on a per-window basis, but lazy between windows. And when is it needed? The way I see it, spools are needed for two reasons. 1 – When data is going to be needed AGAIN. 2 – When data needs to be kept away from the original source. If you’re someone that writes long stored procedures, you are probably quite aware of the second scenario. I see plenty of stored procedures being written this way – where the query writer populates a temporary table, so that they can make updates to it without risking the original table. SQL does this too. Imagine I’m updating my contact list, and some of my changes move data to later in the book. If I’m not careful, I might update the same row a second time (or even enter an infinite loop, updating it over and over). A spool can make sure that I don’t, by using a copy of the data. This problem is known as the Halloween Effect (not because it’s spooky, but because it was discovered in late October one year). As I’m sure you can imagine, the kind of spool you’d need to protect against the Halloween Effect would be eager, because if you’re only handling one row at a time, then you’re not providing the protection... An eager spool will block the flow of data, waiting until it has fetched all the data before serving it up to the operator that called it. In the query below I’m forcing the Query Optimizer to use an index which would be upset if the Name column values got changed, and we see that before any data is fetched, a spool is created to load the data into. This doesn’t stop the index being maintained, but it does mean that the index is protected from the changes that are being done. There are plenty of times, though, when you need data repeatedly. Consider the query I put above. A simple join, but then counting the number of rows that came through. The way that this has executed (be it ideal or not), is to ask that a Table Spool be populated. That’s the Table Spool operator on the top row. That spool can produce the same set of rows repeatedly. This is the behaviour that we see in the bottom half of the plan. In the bottom half of the plan, we see that the a join is being done between the rows that are being sourced from the spool – one being aggregated and one not – producing the columns that we need for the query. Table v Index When considering whether to use a Table Spool or an Index Spool, the question that the Query Optimizer needs to answer is whether there is sufficient benefit to storing the data in a b-tree. The idea of having data in indexes is great, but of course there is a cost to maintaining them. Here we’re creating a temporary structure for data, and there is a cost associated with populating each row into its correct position according to a b-tree, as opposed to simply adding it to the end of the list of rows in a heap. Using a b-tree could even result in page-splits as the b-tree is populated, so there had better be a reason to use that kind of structure. That all depends on how the data is going to be used in other parts of the plan. If you’ve ever thought that you could use a temporary index for a particular query, well this is it – and the Query Optimizer can do that if it thinks it’s worthwhile. It’s worth noting that just because a Spool is populated using an Index Spool, it can still be fetched using a Table Spool. The details about whether or not a Spool used as a source shows as a Table Spool or an Index Spool is more about whether a Seek predicate is used, rather than on the underlying structure. Recursive CTE I’ve already shown you an example of spooling when the OVER clause is used. You might see them being used whenever you have data that is needed multiple times, and CTEs are quite common here. With the definition of a set of data described in a CTE, if the query writer is leveraging this by referring to the CTE multiple times, and there’s no simplification to be leveraged, a spool could theoretically be used to avoid reapplying the CTE’s logic. Annoyingly, this doesn’t happen. Consider this query, which really looks like it’s using the same data twice. I’m creating a set of data (which is completely deterministic, by the way), and then joining it back to itself. There seems to be no reason why it shouldn’t use a spool for the set described by the CTE, but it doesn’t. On the other hand, if we don’t pull as many columns back, we might see a very different plan. You see, CTEs, like all sub-queries, are simplified out to figure out the best way of executing the whole query. My example is somewhat contrived, and although there are plenty of cases when it’s nice to give the Query Optimizer hints about how to execute queries, it usually doesn’t do a bad job, even without spooling (and you can always use a temporary table). When recursion is used, though, spooling should be expected. Consider what we’re asking for in a recursive CTE. We’re telling the system to construct a set of data using an initial query, and then use set as a source for another query, piping this back into the same set and back around. It’s very much a spool. The analogy of cotton is long gone here, as the idea of having a continual loop of cotton feeding onto a spool and off again doesn’t quite fit, but that’s what we have here. Data is being fed onto the spool, and getting pulled out a second time when the spool is used as a source. (This query is running on AdventureWorks, which has a ManagerID column in HumanResources.Employee, not AdventureWorks2012) The Index Spool operator is sucking rows into it – lazily. It has to be lazy, because at the start, there’s only one row to be had. However, as rows get populated onto the spool, the Table Spool operator on the right can return rows when asked, ending up with more rows (potentially) getting back onto the spool, ready for the next round. (The Assert operator is merely checking to see if we’ve reached the MAXRECURSION point – it vanishes if you use OPTION (MAXRECURSION 0), which you can try yourself if you like). Spools are useful. Don’t lose sight of that. Every time you use temporary tables or table variables in a stored procedure, you’re essentially doing the same – don’t get upset at the Query Optimizer for doing so, even if you think the spool looks like an expensive part of the query. I hope you’re enjoying this T-SQL Tuesday. Why not head over to my post that is hosting it this month to read about some other plan operators? At some point I’ll write a summary post – once I have you should find a comment below pointing at it. @rob_farley

Read the article

Search Results

Search found 60391 results on 2416 pages for 'data generation'.

Page 441/2416 | < Previous Page | 437 438 439 440 441 442 443 444 445 446 447 448 | Next Page >

- by pygorex1

- by Mat Keep

- by Horace Loeb

- by crankharder

- by fatcat1111

- by janesconference

- by HerbalMart

- by Arda Eralp

- by user322787

- by Mahran Elneel

- by user123276

- by adoran

- by user315709

- by IT Helpdesk Team Manager

- by Kim Bastin

- by user12620111

- by Generation D Systems

- by Luke404

- by dwwilson66

- by Josh Glover

- by maxim

- by Pinal Dave

- by MarkPearl

- by Vlad Lata

- by Rob Farley

< Previous Page | 437 438 439 440 441 442 443 444 445 446 447 448 | Next Page >