Search Results

Search found 5666 results on 227 pages for 'cost analysis'.

Page 46/227 | < Previous Page | 42 43 44 45 46 47 48 49 50 51 52 53 | Next Page >

SQL analytical mash-ups deliver real-time WOW! for big data

- by KLaker

One of the overlooked capabilities of SQL as an analysis engine, because we all just take it for granted, is that you can mix and match analytical features to create some amazing mash-ups. As we move into the exciting world of big data these mash-ups can really deliver those "wow, I never knew that" moments. While Java is an incredibly flexible and powerful framework for managing big data there are some significant challenges in using Java and MapReduce to drive your analysis to create these "wow" discoveries. One of these "wow" moments was demonstrated at this year's OpenWorld during Andy Mendelsohn's general keynote session. Here is the scenario - we are looking for fraudulent activities in our big data stream and in this case we identifying potentially fraudulent activities by looking for specific patterns. We using geospatial tagging of each transaction so we can create a real-time fraud-map for our business users. Where we start to move towards a "wow" moment is to extend this basic use of spatial and pattern matching, as shown in the above dashboard screen, to incorporate spatial analytics within the SQL pattern matching clause. This will allow us to compute the distance between transactions. Apologies for the quality of this screenshot….hopefully below you see where we have extended our SQL pattern matching clause to use location of each transaction and to calculate the distance between each transaction: This allows us to compare the time of the last transaction with the time of the current transaction and see if the distance between the two points is possible given the time frame. Obviously if I buy something in Florida from my favourite bike store (may be a new carbon saddle for my Trek) and then 5 minutes later the system sees my credit card details being used in Arizona there is high probability that this transaction in Arizona is actually fraudulent (I am fast on my Trek but not that fast!) and we can flag this up in real-time on our dashboard: In this post I have used the term "real-time" a couple of times and this is an important point and one of the key reasons why SQL really is the only language to use if you want to analyse big data. One of the most important questions that comes up in every big data project is: how do we do analysis? Many enlightened customers are now realising that using Java-MapReduce to deliver analysis does not result in "wow" moments. These "wow" moments only come with SQL because it is offers a much richer environment, it is simpler to use and it is faster - which makes it possible to deliver real-time "Wow!". Below is a slide from Andy's session showing the results of a comparison of Java-MapReduce vs. SQL pattern matching to deliver our "wow" moment during our live demo. You can watch our analytical mash-up "Wow" demo that compares the power of 12c SQL pattern matching + spatial analytics vs. Java-MapReduce here: You can get more information about SQL Pattern Matching on our SQL Analytics home page on OTN, see here http://www.oracle.com/technetwork/database/bi-datawarehousing/sql-analytics-index-1984365.html. You can get more information about our spatial analytics here: http://www.oracle.com/technetwork/database-options/spatialandgraph/overview/index.html If you would like to watch the full Database 12c OOW presentation see here: http://medianetwork.oracle.com/video/player/2686974264001

Read the article
Exalytics and Oracle Business Intelligence Enterprise Edition (OBIEE) Partner Workshop

- by mseika

Workshop Description Oracle Fusion Middleware 11g is the #1 application infrastructure foundation. It enables enterprises to create and run agile and intelligent business applications and maximize IT efficiency by exploiting modern hardware and software architectures. Oracle Exalytics Business Intelligence Machine is the world’s first engineered system specifically designed to deliver high performance analysis, modeling and planning. Built using industry-standard hardware, market-leading business intelligence software and in-memory database technology, Oracle Exalytics is an optimized system that delivers unmatched speed, visualizations and scalability for Business Intelligence and Enterprise Performance Management applications. This FREE hands-on, partner workshop highlights both the hardware and software components that are engineered to work together to deliver Oracle Exalytics - an optimized version of the industry-leading Oracle TimesTen In-Memory Database with analytic extensions, a highly scalable Oracle server designed specifically for in-memory business intelligence, and Oracle’s proven Business Intelligence Foundation with enhanced visualization capabilities and performance optimizations. This workshop will provide hands-on experience with Oracle's latest engineered system. Topics covered will include TimesTen In-Memory Database and the new Summary Advisor for Exalytics, the technical details (including mobile features) of the latest release of visualization enhancements for OBI-EE, and technical updates on Essbase. After taking this course, you will be well prepared to architect, build, demo, and implement an end-to-end Exalytics solution. You will also be able to extend your current analytical and enterprise performance management application implementations with numerous Oracle technologies specifically enhanced to take advantage of the compute capacity and in-memory capabilities of Oracle Exalytics.If you are a BI or Data Warehouse Architect, developer or consultant, you don’t want to miss this 3-day workshop. Register Now! Presentations Exalytics Architectural Overview Upgrade and Lifecycle Management Times Ten for Exalytics Summary Advisor Utility Essbase and EPM System on Exalytics Dashboard and Analysis Interactions OBIEE 11.1.1.6 Features and Advanced Topics Lab OutlineThe labs showcase Oracle Exalytics core components and functionality and provide expertise of Oracle Business Intelligence 11.1.1.6 new features and updates from prior releases. The hands-on activities are based on an Oracle VirtualBox image with software and training samples pre-installed. Lab Environment Setup Creating and Working with Oracle TimesTen In-Memory Database Running Summary Advisor Utility Working with Exalytics Visualization Features – Dashboard and Analysis Interactions Audience Oracle Partners BI and EPM Application Developers and Implementers System Integrators and Solution Consultants Data Warehouse Developers Enterprise Architects Prerequisites Experience and understanding of OBIEE 11g is required Previous attendance of Oracle Business Intelligence Foundation Suite Workshop or BIEE 11gIntroduction Workshop is highly recommended Good understanding of data warehousing and data modeling for reporting and analysis purpose Strong experience with database technologies preferred Equipment RequirementsThis workshop requires attendees to provide their own laptops for this class.Attendee laptops must meet the following minimum hardware/software requirements: Hardware Minimum 8GB RAM 60 GB free space (includes staging) USB 2.0 port (at least one available) It is strongly recommended that you bring a mouse. You will be working in a development environment and using the mouse heavily. Software One of the following operating systems: 64-bit Windows host/laptop OS 64-bit host/laptop OS with a Windows VM (XP, Server, or Win 7, BIC2g, etc.) Internet Explorer 7.x/8.x or Firefox 3.5.x WINRAR or 7ziputility to unzip workshop files: Download-able from http://www.win-rar.com/download.html Download-able from http://www.7zip.com/ Oracle VirtualBox 4.0.2 or higher Downloadable from http://www.virtualbox.org/wiki/Downloads CPU virtualization mode needs to be enabled. We will provide guidance on the day of the workshop. Attendees will be given a VirtualBox image containing a pre-installed Oracle Exalytics environment. Schedule This workshop is 3 days. - Times vary by country!9:00am: Sign-in and technical setup 9:30am: Workshop starts 5:00pm: Workshop ends Oracle Exalytics and Business Intelligence (OBIEE) Workshop December 11-13, 2012: Oracle BVP, Birmingham, UK Register Here. Questions? Send email to: [email protected] Oracle Platform Technologies Enablement Services

Read the article
Oracle Spatial and Graph – A year in review

- by Mandy Ho

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} What a great year for Oracle Spatial! Or shall I now say, Oracle Spatial and Graph, with our official name change this summer. There were so many exciting events and updates we had this year, and this blog will review and link to some of the events you may have missed over the year. We kicked off 2012 with our webinar: Situational Analysis at OnStar with Oracle Spatial and Graph. We collaborated with OnStar’s Emergency Strategy and Outreach expert, Jeff Joyner ,on how Onstar uses Google Earth Visualization, NAVTEQ data and Oracle Database to deliver fast, accurate emergency services to its customers. In the next webinar in our 2012 series, Oracle partner TARGUSinfo showcased how to build a robust, scalable and secure customer relationship management systems – with built-in mapping and spatial analysis, and deployed in the cloud. This is a very cool system using all Oracle technologies including Oracle Database and Fusion Middleware MapViewer. Attendees learned how to gather market insight, score prospects and customers and perform location analysis. The replay is available here. Our final webinar of the year focused on using Oracle Business Intelligence tools, along with Oracle Spatial and Graph to perform location-aware predictive analysis. Watch the webcast here: In June, we joined up with the Location Intelligence conference in Washington, DC, and had a very successful 2012 Oracle Spatial User Conference. Customers and partners from the US, as well as from EMEA and Asia, flew in to share experiences and ideas, and get technical updates from Oracle experts. Users were excited to hear about spatial-Exadata performance, and advances in MapViewer and BI. Peter Doolan of Oracle Public Sector kicked off the event with a great keynote, and US Census, NOAA, and Ordnance Survey Great Britain were just a few of the presenters. Presentation archive here. We recognized some of the most exceptional partners and customers for their contributions to advancing mainstream solutions using geospatial technologies. Planning for 2013’s conference has already started. Please contribute your papers for consideration here. http://www.locationintelligence.net/ We also launched a new Oracle PartnerNetwork Spatial Specialization – to enable partners to get validated in the marketplace for their expertise in taking solutions to market. Individuals can also get individual certifications. Learn more here. Oracle Open World was not to disappoint, with news regarding our next Oracle Spatial and Graph release, as well as the announcement of our new Oracle Spatial and Graph SIG board! Join the SIG today. One more exciting event as we look to 2013. Spatial and location technologies have a dedicated track at the January BIWA SIG Summit – on January 9-10 in Redwood Shores, CA. View the agenda and register here: www.biwasummit.org. We thank you for all your support during the year of 2012 and look towards an even more exciting 2013! Wishing you and your family a prosperous New Year and Happy Holidays!

Read the article
Same SELECT used in an INSERT has different execution plan

- by amacias

A customer complained that a query and its INSERT counterpart had different execution plans, and of course, the INSERT was slower. First lets look at the SELECT : SELECT ua_tr_rundatetime, ua_ch_treatmentcode, ua_tr_treatmentcode, ua_ch_cellid, ua_tr_cellid FROM (SELECT DISTINCT CH.treatmentcode AS UA_CH_TREATMENTCODE, CH.cellid AS UA_CH_CELLID FROM CH, DL WHERE CH.contactdatetime > SYSDATE - 5 AND CH.treatmentcode = DL.treatmentcode) CH_CELLS, (SELECT DISTINCT T.treatmentcode AS UA_TR_TREATMENTCODE, T.cellid AS UA_TR_CELLID, T.rundatetime AS UA_TR_RUNDATETIME FROM T, DL WHERE T.treatmentcode = DL.treatmentcode) TRT_CELLS WHERE CH_CELLS.ua_ch_treatmentcode(+) = TRT_CELLS.ua_tr_treatmentcode; The query has 2 DISTINCT subqueries. The execution plan shows one with DISTICT Placement transformation applied and not the other. The view in Step 5 has the prefix VW_DTP which means DISTINCT Placement. -------------------------------------------------------------------- | Id | Operation | Name | Cost (%CPU) -------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 272K(100) |* 1 | HASH JOIN OUTER | | 272K (1) | 2 | VIEW | | 4408 (1) | 3 | HASH UNIQUE | | 4408 (1) |* 4 | HASH JOIN | | 4407 (1) | 5 | VIEW | VW_DTP_48BAF62C | 1660 (2) | 6 | HASH UNIQUE | | 1660 (2) | 7 | TABLE ACCESS FULL | DL | 1644 (1) | 8 | TABLE ACCESS FULL | T | 2744 (1) | 9 | VIEW | | 267K (1) | 10 | HASH UNIQUE | | 267K (1) |* 11 | HASH JOIN | | 267K (1) | 12 | PARTITION RANGE ITERATOR| | 266K (1) |* 13 | TABLE ACCESS FULL | CH | 266K (1) | 14 | TABLE ACCESS FULL | DL | 1644 (1) -------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): ------------------------------------------------------------- 1 - SEL$1 2 - SEL$AF418D5F / TRT_CELLS@SEL$1 3 - SEL$AF418D5F 5 - SEL$F6AECEDE / VW_DTP_48BAF62C@SEL$48BAF62C 6 - SEL$F6AECEDE 7 - SEL$F6AECEDE / DL@SEL$3 8 - SEL$AF418D5F / T@SEL$3 9 - SEL$2 / CH_CELLS@SEL$1 10 - SEL$2 13 - SEL$2 / CH@SEL$2 14 - SEL$2 / DL@SEL$2 Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("CH_CELLS"."UA_CH_TREATMENTCODE"="TRT_CELLS"."UA_TR_TREATMENTCODE") 4 - access("T"."TREATMENTCODE"="ITEM_1") 11 - access("CH"."TREATMENTCODE"="DL"."TREATMENTCODE") 13 - filter("CH"."CONTACTDATETIME">SYSDATE@!-5) The outline shows PLACE_DISTINCT(@"SEL$3" "DL"@"SEL$3") indicating that the QB3 is the one that got the transformation. Outline Data ------------- /*+ BEGIN_OUTLINE_DATA IGNORE_OPTIM_EMBEDDED_HINTS OPTIMIZER_FEATURES_ENABLE('11.2.0.3') DB_VERSION('11.2.0.3') ALL_ROWS OUTLINE_LEAF(@"SEL$2") OUTLINE_LEAF(@"SEL$F6AECEDE") OUTLINE_LEAF(@"SEL$AF418D5F") PLACE_DISTINCT(@"SEL$3" "DL"@"SEL$3") OUTLINE_LEAF(@"SEL$1") OUTLINE(@"SEL$48BAF62C") OUTLINE(@"SEL$3") NO_ACCESS(@"SEL$1" "TRT_CELLS"@"SEL$1") NO_ACCESS(@"SEL$1" "CH_CELLS"@"SEL$1") LEADING(@"SEL$1" "TRT_CELLS"@"SEL$1" "CH_CELLS"@"SEL$1") USE_HASH(@"SEL$1" "CH_CELLS"@"SEL$1") FULL(@"SEL$2" "CH"@"SEL$2") FULL(@"SEL$2" "DL"@"SEL$2") LEADING(@"SEL$2" "CH"@"SEL$2" "DL"@"SEL$2") USE_HASH(@"SEL$2" "DL"@"SEL$2") USE_HASH_AGGREGATION(@"SEL$2") NO_ACCESS(@"SEL$AF418D5F" "VW_DTP_48BAF62C"@"SEL$48BAF62C") FULL(@"SEL$AF418D5F" "T"@"SEL$3") LEADING(@"SEL$AF418D5F" "VW_DTP_48BAF62C"@"SEL$48BAF62C" "T"@"SEL$3") USE_HASH(@"SEL$AF418D5F" "T"@"SEL$3") USE_HASH_AGGREGATION(@"SEL$AF418D5F") FULL(@"SEL$F6AECEDE" "DL"@"SEL$3") USE_HASH_AGGREGATION(@"SEL$F6AECEDE") END_OUTLINE_DATA */ The 10053 shows there is a comparative of cost with and without the transformation. This means the transformation belongs to Cost-Based Query Transformations (CBQT). In SEL$3 the optimization of the query block without the transformation is 6659.73 and with the transformation is 4408.41 so the transformation is kept. GBP/DP: Checking validity of GBP/DP for query block SEL$3 (#3) DP: Checking validity of distinct placement for query block SEL$3 (#3) DP: Using search type: linear DP: Considering distinct placement on query block SEL$3 (#3) DP: Starting iteration 1, state space = (5) : (0) DP: Original query DP: Costing query block. DP: Updated best state, Cost = 6659.73 DP: Starting iteration 2, state space = (5) : (1) DP: Using DP transformation in this iteration. DP: Transformed query DP: Costing query block. DP: Updated best state, Cost = 4408.41 DP: Doing DP on the original QB. DP: Doing DP on the preserved QB. In SEL$2 the cost without the transformation is less than with it so it is not kept. GBP/DP: Checking validity of GBP/DP for query block SEL$2 (#2) DP: Checking validity of distinct placement for query block SEL$2 (#2) DP: Using search type: linear DP: Considering distinct placement on query block SEL$2 (#2) DP: Starting iteration 1, state space = (3) : (0) DP: Original query DP: Costing query block. DP: Updated best state, Cost = 267936.93 DP: Starting iteration 2, state space = (3) : (1) DP: Using DP transformation in this iteration. DP: Transformed query DP: Costing query block. DP: Not update best state, Cost = 267951.66 To the same query an INSERT INTO is added and the result is a very different execution plan. INSERT INTO cc (ua_tr_rundatetime, ua_ch_treatmentcode, ua_tr_treatmentcode, ua_ch_cellid, ua_tr_cellid)SELECT ua_tr_rundatetime, ua_ch_treatmentcode, ua_tr_treatmentcode, ua_ch_cellid, ua_tr_cellidFROM (SELECT DISTINCT CH.treatmentcode AS UA_CH_TREATMENTCODE, CH.cellid AS UA_CH_CELLID FROM CH, DL WHERE CH.contactdatetime > SYSDATE - 5 AND CH.treatmentcode = DL.treatmentcode) CH_CELLS, (SELECT DISTINCT T.treatmentcode AS UA_TR_TREATMENTCODE, T.cellid AS UA_TR_CELLID, T.rundatetime AS UA_TR_RUNDATETIME FROM T, DL WHERE T.treatmentcode = DL.treatmentcode) TRT_CELLSWHERE CH_CELLS.ua_ch_treatmentcode(+) = TRT_CELLS.ua_tr_treatmentcode;----------------------------------------------------------| Id | Operation | Name | Cost (%CPU)----------------------------------------------------------| 0 | INSERT STATEMENT | | 274K(100)| 1 | LOAD TABLE CONVENTIONAL | | |* 2 | HASH JOIN OUTER | | 274K (1)| 3 | VIEW | | 6660 (1)| 4 | SORT UNIQUE | | 6660 (1)|* 5 | HASH JOIN | | 6659 (1)| 6 | TABLE ACCESS FULL | DL | 1644 (1)| 7 | TABLE ACCESS FULL | T | 2744 (1)| 8 | VIEW | | 267K (1)| 9 | SORT UNIQUE | | 267K (1)|* 10 | HASH JOIN | | 267K (1)| 11 | PARTITION RANGE ITERATOR| | 266K (1)|* 12 | TABLE ACCESS FULL | CH | 266K (1)| 13 | TABLE ACCESS FULL | DL | 1644 (1)----------------------------------------------------------Query Block Name / Object Alias (identified by operation id):------------------------------------------------------------- 1 - SEL$1 3 - SEL$3 / TRT_CELLS@SEL$1 4 - SEL$3 6 - SEL$3 / DL@SEL$3 7 - SEL$3 / T@SEL$3 8 - SEL$2 / CH_CELLS@SEL$1 9 - SEL$2 12 - SEL$2 / CH@SEL$2 13 - SEL$2 / DL@SEL$2Predicate Information (identified by operation id):--------------------------------------------------- 2 - access("CH_CELLS"."UA_CH_TREATMENTCODE"="TRT_CELLS"."UA_TR_TREATMENTCODE") 5 - access("T"."TREATMENTCODE"="DL"."TREATMENTCODE") 10 - access("CH"."TREATMENTCODE"="DL"."TREATMENTCODE") 12 - filter("CH"."CONTACTDATETIME">SYSDATE@!-5)Outline Data------------- /*+ BEGIN_OUTLINE_DATA IGNORE_OPTIM_EMBEDDED_HINTS OPTIMIZER_FEATURES_ENABLE('11.2.0.3') DB_VERSION('11.2.0.3') ALL_ROWS OUTLINE_LEAF(@"SEL$2") OUTLINE_LEAF(@"SEL$3") OUTLINE_LEAF(@"SEL$1") OUTLINE_LEAF(@"INS$1") FULL(@"INS$1" "CC"@"INS$1") NO_ACCESS(@"SEL$1" "TRT_CELLS"@"SEL$1") NO_ACCESS(@"SEL$1" "CH_CELLS"@"SEL$1") LEADING(@"SEL$1" "TRT_CELLS"@"SEL$1" "CH_CELLS"@"SEL$1") USE_HASH(@"SEL$1" "CH_CELLS"@"SEL$1") FULL(@"SEL$2" "CH"@"SEL$2") FULL(@"SEL$2" "DL"@"SEL$2") LEADING(@"SEL$2" "CH"@"SEL$2" "DL"@"SEL$2") USE_HASH(@"SEL$2" "DL"@"SEL$2") USE_HASH_AGGREGATION(@"SEL$2") FULL(@"SEL$3" "DL"@"SEL$3") FULL(@"SEL$3" "T"@"SEL$3") LEADING(@"SEL$3" "DL"@"SEL$3" "T"@"SEL$3") USE_HASH(@"SEL$3" "T"@"SEL$3") USE_HASH_AGGREGATION(@"SEL$3") END_OUTLINE_DATA */ There is no DISTINCT Placement view and no hint.The 10053 trace shows a new legend "DP: Bypassed: Not SELECT"implying that this is a transformation that it is possible only for SELECTs. GBP/DP: Checking validity of GBP/DP for query block SEL$3 (#4) DP: Checking validity of distinct placement for query block SEL$3 (#4) DP: Bypassed: Not SELECT. GBP/DP: Checking validity of GBP/DP for query block SEL$2 (#3) DP: Checking validity of distinct placement for query block SEL$2 (#3) DP: Bypassed: Not SELECT. In 12.1 (and hopefully in 11.2.0.4 when released) the restriction on applying CBQT to some DMLs and DDLs (like CTAS) is lifted.This is documented in BugTag Note:10013899.8 Allow CBQT for some DML / DDLAnd interestingly enough, it is possible to have a one-off patch in 11.2.0.3. SQL> select DESCRIPTION,OPTIMIZER_FEATURE_ENABLE,IS_DEFAULT 2 from v$system_fix_control where BUGNO='10013899'; DESCRIPTION ---------------------------------------------------------------- OPTIMIZER_FEATURE_ENABLE IS_DEFAULT ------------------------- ---------- enable some transformations for DDL and DML statements 11.2.0.4 1

Read the article
SQL Server Express 2008 R2 Installation error at Windows 7

- by Shai Sherman

Hello, I created install script that will install SQL Server 2008 R2 on windows XP SP3, windows vista and windows 7. One of the command that i used in the installation is for silent installation of SQL Server 2008 R2. When i install it on windows XP everything works just fine but when i try to install it on Windows 7 i get an error. What am I doing wrong? Here is the command line that i use: "Setup.exe /ConfigurationFile=Mysetup.ini" Mysetup.ini file: -------------------------------------Start of ini file --------------------------------- ;SQL SERVER 2008 R2 Configuration File ;Version 1.0, 5 May 2010 ; [SQLSERVER2008] ; Specify the Instance ID for the SQL Server features you have specified. SQL Server directory structure, registry structure, and service names will reflect the instance ID of the SQL Server instance. INSTANCEID="MSSQLSERVER" ; Specifies a Setup work flow, like INSTALL, UNINSTALL, or UPGRADE. This is a required parameter. ACTION="Install" ; Specifies features to install, uninstall, or upgrade. The list of top-level features include SQL, AS, RS, IS, and Tools. The SQL feature will install the database engine, replication, and full-text. The Tools feature will install Management Tools, Books online, Business Intelligence Development Studio, and other shared components. FEATURES=SQLENGINE ; Displays the command line parameters usage HELP="False" ; Specifies that the detailed Setup log should be piped to the console. INDICATEPROGRESS="False" ; Setup will not display any user interface. QUIET="False" ; Setup will display progress only without any user interaction. QUIETSIMPLE="True" ; Specifies that Setup should install into WOW64. This command line argument is not supported on an IA64 or a 32-bit system. ;X86="False" ; Specifies the path to the installation media folder where setup.exe is located. ;MEDIASOURCE="z:\" ; Detailed help for command line argument ENU has not been defined yet. ENU="True" ; Parameter that controls the user interface behavior. Valid values are Normal for the full UI, and AutoAdvance for a simplied UI. ; UIMODE="Normal" ; Specify if errors can be reported to Microsoft to improve future SQL Server releases. Specify 1 or True to enable and 0 or False to disable this feature. ERRORREPORTING="False" ; Specify the root installation directory for native shared components. ;INSTALLSHAREDDIR="D:\Program Files\Microsoft SQL Server" ; Specify the root installation directory for the WOW64 shared components. ;INSTALLSHAREDWOWDIR="D:\Program Files (x86)\Microsoft SQL Server" ; Specify the installation directory. ;INSTANCEDIR="D:\Program Files\Microsoft SQL Server" ; Specify that SQL Server feature usage data can be collected and sent to Microsoft. Specify 1 or True to enable and 0 or False to disable this feature. SQMREPORTING="False" ; Specify a default or named instance. MSSQLSERVER is the default instance for non-Express editions and SQLExpress for Express editions. This parameter is required when installing the SQL Server Database Engine (SQL), Analysis Services (AS), or Reporting Services (RS). INSTANCENAME="SQLEXPRESS" SECURITYMODE=SQL SAPWD=SystemAdmin ; Agent account name AGTSVCACCOUNT="NT AUTHORITY\NETWORK SERVICE" ; Auto-start service after installation. AGTSVCSTARTUPTYPE="Manual" ; Startup type for Integration Services. ;ISSVCSTARTUPTYPE="Automatic" ; Account for Integration Services: Domain\User or system account. ;ISSVCACCOUNT="NT AUTHORITY\NetworkService" ; Controls the service startup type setting after the service has been created. ;ASSVCSTARTUPTYPE="Automatic" ; The collation to be used by Analysis Services. ;ASCOLLATION="Latin1_General_CI_AS" ; The location for the Analysis Services data files. ;ASDATADIR="Data" ; The location for the Analysis Services log files. ;ASLOGDIR="Log" ; The location for the Analysis Services backup files. ;ASBACKUPDIR="Backup" ; The location for the Analysis Services temporary files. ;ASTEMPDIR="Temp" ; The location for the Analysis Services configuration files. ;ASCONFIGDIR="Config" ; Specifies whether or not the MSOLAP provider is allowed to run in process. ;ASPROVIDERMSOLAP="1" ; A port number used to connect to the SharePoint Central Administration web application. ;FARMADMINPORT="0" ; Startup type for the SQL Server service. SQLSVCSTARTUPTYPE="Automatic" ; Level to enable FILESTREAM feature at (0, 1, 2 or 3). FILESTREAMLEVEL="0" ; Set to "1" to enable RANU for SQL Server Express. ENABLERANU="1" ; Specifies a Windows collation or an SQL collation to use for the Database Engine. SQLCOLLATION="SQL_Latin1_General_CP1_CI_AS" ; Account for SQL Server service: Domain\User or system account. SQLSVCACCOUNT="NT Authority\System" ; Default directory for the Database Engine user databases. ;SQLUSERDBDIR="K:\Microsoft SQL Server\MSSQL\Data" ; Default directory for the Database Engine user database logs. ;SQLUSERDBLOGDIR="L:\Microsoft SQL Server\MSSQL\Data\Logs" ; Directory for Database Engine TempDB files. ;SQLTEMPDBDIR="T:\Microsoft SQL Server\MSSQL\Data" ; Directory for the Database Engine TempDB log files. ;SQLTEMPDBLOGDIR="T:\Microsoft SQL Server\MSSQL\Data\Logs" ; Provision current user as a Database Engine system administrator for SQL Server 2008 R2 Express. ADDCURRENTUSERASSQLADMIN="True" ; Specify 0 to disable or 1 to enable the TCP/IP protocol. TCPENABLED="1" ; Specify 0 to disable or 1 to enable the Named Pipes protocol. NPENABLED="0" ; Startup type for Browser Service. BROWSERSVCSTARTUPTYPE="Automatic" ; Specifies how the startup mode of the report server NT service. When ; Manual - Service startup is manual mode (default). ; Automatic - Service startup is automatic mode. ; Disabled - Service is disabled ;RSSVCSTARTUPTYPE="Automatic" ; Specifies which mode report server is installed in. ; Default value: “FilesOnly” ;RSINSTALLMODE="FilesOnlyMode" ; Accept SQL Server 2008 R2 license terms IACCEPTSQLSERVERLICENSETERMS="TRUE" ;setup.exe /CONFIGURATIONFILE=Mysetup.ini /INDICATEPROGRESS --------------------------- End of ini file ------------------------------------- And i get this error: 2010-08-31 18:05:53 Slp: Error result: -2068119551 2010-08-31 18:05:53 Slp: Result facility code: 1211 2010-08-31 18:05:53 Slp: Result error code: 1 2010-08-31 18:05:53 Slp: Sco: Attempting to create base registry key HKEY_LOCAL_MACHINE, machine 2010-08-31 18:05:53 Slp: Sco: Attempting to open registry subkey 2010-08-31 18:05:53 Slp: Sco: Attempting to open registry subkey Software\Microsoft\PCHealth\ErrorReporting\DW\Installed 2010-08-31 18:05:53 Slp: Sco: Attempting to get registry value DW0200 2010-08-31 18:05:53 Slp: Submitted 1 of 1 failures to the Watson data repository What the meaning of this? What do i need to do to fix that problem? Here is the Summary file: Overall summary: Final result: SQL Server installation failed. To continue, investigate the reason for the failure, correct the problem, uninstall SQL Server, and then rerun SQL Server Setup. Exit code (Decimal): -2068119551 Exit facility code: 1211 Exit error code: 1 Exit message: SQL Server installation failed. To continue, investigate the reason for the failure, correct the problem, uninstall SQL Server, and then rerun SQL Server Setup. Start time: 2010-08-31 18:03:44 End time: 2010-08-31 18:05:51 Requested action: Install Log with failure: C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20100831_180236\Detail.txt Exception help link: http%3a%2f%2fgo.microsoft.com%2ffwlink%3fLinkId%3d20476%26ProdName%3dMicrosoft%2bSQL%2bServer%26EvtSrc%3dsetup.rll%26EvtID%3d50000%26ProdVer%3d10.50.1600.1%26EvtType%3d0x6121810A%400xC24842DB Machine Properties: Machine name: NVR Machine processor count: 2 OS version: Windows 7 OS service pack: OS region: United States OS language: English (United States) OS architecture: x86 Process architecture: 32 Bit OS clustered: No Product features discovered: Product Instance Instance ID Feature Language Edition Version Clustered Package properties: Description: SQL Server Database Services 2008 R2 ProductName: SQL Server 2008 R2 Type: RTM Version: 10 SPLevel: 0 Installation location: C:\Disk1\setupsql\x86\setup\ Installation edition: EXPRESS User Input Settings: ACTION: Install ADDCURRENTUSERASSQLADMIN: True AGTSVCACCOUNT: NT AUTHORITY\NETWORK SERVICE AGTSVCPASSWORD: * AGTSVCSTARTUPTYPE: Disabled ASBACKUPDIR: Backup ASCOLLATION: Latin1_General_CI_AS ASCONFIGDIR: Config ASDATADIR: Data ASDOMAINGROUP: ASLOGDIR: Log ASPROVIDERMSOLAP: 1 ASSVCACCOUNT: ASSVCPASSWORD: * ASSVCSTARTUPTYPE: Automatic ASSYSADMINACCOUNTS: ASTEMPDIR: Temp BROWSERSVCSTARTUPTYPE: Automatic CONFIGURATIONFILE: C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20100831_180236\ConfigurationFile.ini CUSOURCE: ENABLERANU: True ENU: True ERRORREPORTING: False FARMACCOUNT: FARMADMINPORT: 0 FARMPASSWORD: * FEATURES: SQLENGINE FILESTREAMLEVEL: 0 FILESTREAMSHARENAME: FTSVCACCOUNT: FTSVCPASSWORD: * HELP: False IACCEPTSQLSERVERLICENSETERMS: True INDICATEPROGRESS: False INSTALLSHAREDDIR: C:\Program Files\Microsoft SQL Server\ INSTALLSHAREDWOWDIR: C:\Program Files\Microsoft SQL Server\ INSTALLSQLDATADIR: INSTANCEDIR: C:\Program Files\Microsoft SQL Server\ INSTANCEID: MSSQLSERVER INSTANCENAME: SQLEXPRESS ISSVCACCOUNT: NT AUTHORITY\NetworkService ISSVCPASSWORD: * ISSVCSTARTUPTYPE: Automatic NPENABLED: 0 PASSPHRASE: * PCUSOURCE: PID: * QUIET: False QUIETSIMPLE: True ROLE: AllFeatures_WithDefaults RSINSTALLMODE: FilesOnlyMode RSSVCACCOUNT: NT AUTHORITY\NETWORK SERVICE RSSVCPASSWORD: * RSSVCSTARTUPTYPE: Automatic SAPWD: * SECURITYMODE: SQL SQLBACKUPDIR: SQLCOLLATION: SQL_Latin1_General_CP1_CI_AS SQLSVCACCOUNT: NT Authority\System SQLSVCPASSWORD: * SQLSVCSTARTUPTYPE: Automatic SQLSYSADMINACCOUNTS: SQLTEMPDBDIR: SQLTEMPDBLOGDIR: SQLUSERDBDIR: SQLUSERDBLOGDIR: SQMREPORTING: False TCPENABLED: 1 UIMODE: AutoAdvance X86: False Configuration file: C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20100831_180236\ConfigurationFile.ini Detailed results: Feature: Database Engine Services Status: Failed: see logs for details MSI status: Passed Configuration status: Failed: see details below Configuration error code: 0x0A2FBD17@1211@1 Configuration error description: The process cannot access the file because it is being used by another process. Configuration log: C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20100831_180236\Detail.txt Rules with failures: Global rules: Scenario specific rules: Rules report file: C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20100831_180236\SystemConfigurationCheck_Report.htm What should I do and why does this problem occur? Thanks , Shai.

Read the article
MySQL Error: FUNCTION LEVENSHTEIN already exists

- by kgrote

I've got an ExpressionEngine database and I exported a couple of tables from it, then dropped those tables. When I try to re-import the tables in PHPMyAdmin, I get this error: SQL query: -- -- Database: `my_db` -- DELIMITER $$ -- -- Functions -- CREATE DEFINER=`my_username`@`%` FUNCTION `LEVENSHTEIN`(s1 VARCHAR(255), s2 VARCHAR(255)) RETURNS int(11) DETERMINISTIC BEGIN DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT; DECLARE s1_char CHAR; DECLARE cv0, cv1 VARBINARY(256); SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0; IF s1 = s2 THEN RETURN 0; ELSEIF s1_len = 0 THEN RETURN s2_len; ELSEIF s2_len = 0 THEN RETURN s1_len; ELSE WHILE j <= s2_len DO SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1; END WHILE; WHILE i <= s1_len DO SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1; WHILE j <= s2_len DO SET c = c + 1; IF s1_char = SUBSTRING(s2, j, 1) THEN SET cost = 0; ELSE SET cost = 1; END IF; SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost; IF c > c_temp THEN SET c = [...] MySQL said: Documentation #1304 - FUNCTION LEVENSHTEIN already exists I get this error even if I drop all tables from the DB and try to import anything. The only way I can get the error to go away is to totally delete the database and re-create it. What's causing that error and how can I stop it from happening?

Read the article
Scaling databases with cheap SSD hard drives

- by Dennis Kashkin

Hey guys! I hope that many of you are working with high traffic database-driven websites, and chances are that your main scalability issues are in the database. I noticed a couple of things lately: Most large databases require a team of DBAs in order to scale. They constantly struggle with limitations of hard drives and end up with very expensive solutions (SANs or large RAIDs, frequent maintenance windows for defragging and repartitioning, etc.) The actual annual cost of maintaining such databases is in $100K-$1M range which is too steep for me :) Finally, we got several companies like Intel, Samsung, FusionIO, etc. that just started selling extremely fast yet affordable SSD hard drives based on SLC Flash technology. These drives are 100 times faster in random read/writes than the best spinning hard drives on the market (up to 50,000 random writes per second). Their seek time is pretty much zero, so the cost of random I/O is the same as sequential I/O, which is awesome for databases. These SSD drives cost around $10-$20 per gigabyte, and they are relatively small (64GB). So, there seems to be an opportunity to avoid the HUGE costs of scaling databases the traditional way by simply building a big enough RAID 5 array of SSD drives (which would cost only a few thousand dollars). Then we don't care if the database file is fragmented, and we can afford 100 times more disk writes per second without having to spread the database across 100 spindles. . Is anybody else interested in this? I've been testing a few SSD drives and can share my results. If anybody on this site has already solved their I/O bottleneck with SSDs, I would love to hear your war stories! PS. I know that there are plenty of expensive solutions out there that help with scalability, for example the time proven RAM-based SANs. I want to be clear that even $50K is too expensive for my project. I have to find a solution that costs no more than $10K and does not take much time to implement.

Read the article
When is a SQL function not a function?

- by Rob Farley

Should SQL Server even have functions? (Oh yeah – this is a T-SQL Tuesday post, hosted this month by Brad Schulz) Functions serve an important part of programming, in almost any language. A function is a piece of code that is designed to return something, as opposed to a piece of code which isn’t designed to return anything (which is known as a procedure). SQL Server is no different. You can call stored procedures, even from within other stored procedures, and you can call functions and use these in other queries. Stored procedures might query something, and therefore ‘return data’, but a function in SQL is considered to have the type of the thing returned, and can be used accordingly in queries. Consider the internal GETDATE() function. SELECT GETDATE(), SomeDatetimeColumn FROM dbo.SomeTable; There’s no logical difference between the field that is being returned by the function and the field that’s being returned by the table column. Both are the datetime field – if you didn’t have inside knowledge, you wouldn’t necessarily be able to tell which was which. And so as developers, we find ourselves wanting to create functions that return all kinds of things – functions which look up values based on codes, functions which do string manipulation, and so on. But it’s rubbish. Ok, it’s not all rubbish, but it mostly is. And this isn’t even considering the SARGability impact. It’s far more significant than that. (When I say the SARGability aspect, I mean “because you’re unlikely to have an index on the result of some function that’s applied to a column, so try to invert the function and query the column in an unchanged manner”) I’m going to consider the three main types of user-defined functions in SQL Server: Scalar Inline Table-Valued Multi-statement Table-Valued I could also look at user-defined CLR functions, including aggregate functions, but not today. I figure that most people don’t tend to get around to doing CLR functions, and I’m going to focus on the T-SQL-based user-defined functions. Most people split these types of function up into two types. So do I. Except that most people pick them based on ‘scalar or table-valued’. I’d rather go with ‘inline or not’. If it’s not inline, it’s rubbish. It really is. Let’s start by considering the two kinds of table-valued function, and compare them. These functions are going to return the sales for a particular salesperson in a particular year, from the AdventureWorks database. CREATE FUNCTION dbo.FetchSales_inline(@salespersonid int, @orderyear int) RETURNS TABLE AS RETURN ( SELECT e.LoginID as EmployeeLogin, o.OrderDate, o.SalesOrderID FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ) ; GO CREATE FUNCTION dbo.FetchSales_multi(@salespersonid int, @orderyear int) RETURNS @results TABLE ( EmployeeLogin nvarchar(512), OrderDate datetime, SalesOrderID int ) AS BEGIN INSERT @results (EmployeeLogin, OrderDate, SalesOrderID) SELECT e.LoginID, o.OrderDate, o.SalesOrderID FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ; RETURN END ; GO You’ll notice that I’m being nice and responsible with the use of the DATEADD function, so that I have SARGability on the OrderDate filter. Regular readers will be hoping I’ll show what’s going on in the execution plans here. Here I’ve run two SELECT * queries with the “Show Actual Execution Plan” option turned on. Notice that the ‘Query cost’ of the multi-statement version is just 2% of the ‘Batch cost’. But also notice there’s trickery going on. And it’s nothing to do with that extra index that I have on the OrderDate column. Trickery. Look at it – clearly, the first plan is showing us what’s going on inside the function, but the second one isn’t. The second one is blindly running the function, and then scanning the results. There’s a Sequence operator which is calling the TVF operator, and then calling a Table Scan to get the results of that function for the SELECT operator. But surely it still has to do all the work that the first one is doing... To see what’s actually going on, let’s look at the Estimated plan. Now, we see the same plans (almost) that we saw in the Actuals, but we have an extra one – the one that was used for the TVF. Here’s where we see the inner workings of it. You’ll probably recognise the right-hand side of the TVF’s plan as looking very similar to the first plan – but it’s now being called by a stack of other operators, including an INSERT statement to be able to populate the table variable that the multi-statement TVF requires. And the cost of the TVF is 57% of the batch! But it gets worse. Let’s consider what happens if we don’t need all the columns. We’ll leave out the EmployeeLogin column. Here, we see that the inline function call has been simplified down. It doesn’t need the Employee table. The join is redundant and has been eliminated from the plan, making it even cheaper. But the multi-statement plan runs the whole thing as before, only removing the extra column when the Table Scan is performed. A multi-statement function is a lot more powerful than an inline one. An inline function can only be the result of a single sub-query. It’s essentially the same as a parameterised view, because views demonstrate this same behaviour of extracting the definition of the view and using it in the outer query. A multi-statement function is clearly more powerful because it can contain far more complex logic. But a multi-statement function isn’t really a function at all. It’s a stored procedure. It’s wrapped up like a function, but behaves like a stored procedure. It would be completely unreasonable to expect that a stored procedure could be simplified down to recognise that not all the columns might be needed, but yet this is part of the pain associated with this procedural function situation. The biggest clue that a multi-statement function is more like a stored procedure than a function is the “BEGIN” and “END” statements that surround the code. If you try to create a multi-statement function without these statements, you’ll get an error – they are very much required. When I used to present on this kind of thing, I even used to call it “The Dangers of BEGIN and END”, and yes, I’ve written about this type of thing before in a similarly-named post over at my old blog. Now how about scalar functions... Suppose we wanted a scalar function to return the count of these. CREATE FUNCTION dbo.FetchSales_scalar(@salespersonid int, @orderyear int) RETURNS int AS BEGIN RETURN ( SELECT COUNT(*) FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ); END ; GO Notice the evil words? They’re required. Try to remove them, you just get an error. That’s right – any scalar function is procedural, despite the fact that you wrap up a sub-query inside that RETURN statement. It’s as ugly as anything. Hopefully this will change in future versions. Let’s have a look at how this is reflected in an execution plan. Here’s a query, its Actual plan, and its Estimated plan: SELECT e.LoginID, y.year, dbo.FetchSales_scalar(p.SalesPersonID, y.year) AS NumSales FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year) CROSS JOIN Sales.SalesPerson AS p LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = p.SalesPersonID; We see here that the cost of the scalar function is about twice that of the outer query. Nicely, the query optimizer has worked out that it doesn’t need the Employee table, but that’s a bit of a red herring here. There’s actually something way more significant going on. If I look at the properties of that UDF operator, it tells me that the Estimated Subtree Cost is 0.337999. If I just run the query SELECT dbo.FetchSales_scalar(281,2003); we see that the UDF cost is still unchanged. You see, this 0.0337999 is the cost of running the scalar function ONCE. But when we ran that query with the CROSS JOIN in it, we returned quite a few rows. 68 in fact. Could’ve been a lot more, if we’d had more salespeople or more years. And so we come to the biggest problem. This procedure (I don’t want to call it a function) is getting called 68 times – each one between twice as expensive as the outer query. And because it’s calling it in a separate context, there is even more overhead that I haven’t considered here. The cheek of it, to say that the Compute Scalar operator here costs 0%! I know a number of IT projects that could’ve used that kind of costing method, but that’s another story that I’m not going to go into here. Let’s look at a better way. Suppose our scalar function had been implemented as an inline one. Then it could have been expanded out like a sub-query. It could’ve run something like this: SELECT e.LoginID, y.year, (SELECT COUNT(*) FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = p.SalesPersonID AND o.OrderDate >= DATEADD(year,y.year-2000,'20000101') AND o.OrderDate < DATEADD(year,y.year-2000+1,'20000101') ) AS NumSales FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year) CROSS JOIN Sales.SalesPerson AS p LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = p.SalesPersonID; Don’t worry too much about the Scan of the SalesOrderHeader underneath a Nested Loop. If you remember from plenty of other posts on the matter, execution plans don’t push the data through. That Scan only runs once. The Index Spool sucks the data out of it and populates a structure that is used to feed the Stream Aggregate. The Index Spool operator gets called 68 times, but the Scan only once (the Number of Executions property demonstrates this). Here, the Query Optimizer has a full picture of what’s being asked, and can make the appropriate decision about how it accesses the data. It can simplify it down properly. To get this kind of behaviour from a function, we need it to be inline. But without inline scalar functions, we need to make our function be table-valued. Luckily, that’s ok. CREATE FUNCTION dbo.FetchSales_inline2(@salespersonid int, @orderyear int) RETURNS table AS RETURN (SELECT COUNT(*) as NumSales FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ); GO But we can’t use this as a scalar. Instead, we need to use it with the APPLY operator. SELECT e.LoginID, y.year, n.NumSales FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year) CROSS JOIN Sales.SalesPerson AS p LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = p.SalesPersonID OUTER APPLY dbo.FetchSales_inline2(p.SalesPersonID, y.year) AS n; And now, we get the plan that we want for this query. All we’ve done is tell the function that it’s returning a table instead of a single value, and removed the BEGIN and END statements. We’ve had to name the column being returned, but what we’ve gained is an actual inline simplifiable function. And if we wanted it to return multiple columns, it could do that too. I really consider this function to be superior to the scalar function in every way. It does need to be handled differently in the outer query, but in many ways it’s a more elegant method there too. The function calls can be put amongst the FROM clause, where they can then be used in the WHERE or GROUP BY clauses without fear of calling the function multiple times (another horrible side effect of functions). So please. If you see BEGIN and END in a function, remember it’s not really a function, it’s a procedure. And then fix it. @rob_farley

Read the article
Testing Workflows – Test-After

- by Timothy Klenke

Originally posted on: http://geekswithblogs.net/TimothyK/archive/2014/05/30/testing-workflows-ndash-test-after.aspxIn this post I’m going to outline a few common methods that can be used to increase the coverage of of your test suite. This won’t be yet another post on why you should be doing testing; there are plenty of those types of posts already out there. Assuming you know you should be testing, then comes the problem of how do I actual fit that into my day job. When the opportunity to automate testing comes do you take it, or do you even recognize it? There are a lot of ways (workflows) to go about creating automated tests, just like there are many workflows to writing a program. When writing a program you can do it from a top-down approach where you write the main skeleton of the algorithm and call out to dummy stub functions, or a bottom-up approach where the low level functionality is fully implement before it is quickly wired together at the end. Both approaches are perfectly valid under certain contexts. Each approach you are skilled at applying is another tool in your tool belt. The more vectors of attack you have on a problem – the better. So here is a short, incomplete list of some of the workflows that can be applied to increasing the amount of automation in your testing and level of quality in general. Think of each workflow as an opportunity that is available for you to take. Test workflows basically fall into 2 categories: test first or test after. Test first is the best approach. However, this post isn’t about the one and only best approach. I want to focus more on the lesser known, less ideal approaches that still provide an opportunity for adding tests. In this post I’ll enumerate some test-after workflows. In my next post I’ll cover test-first. Bug Reporting When someone calls you up or forwards you a email with a vague description of a bug its usually standard procedure to create or verify a reproduction plan for the bug via manual testing and log that in a bug tracking system. This can be problematic. Often reproduction plans when written down might skip a step that seemed obvious to the tester at the time or they might be missing some crucial environment setting. Instead of data entry into a bug tracking system, try opening up the test project and adding a failing unit test to prove the bug. The test project guarantees that all aspects of the environment are setup properly and no steps are missing. The language in the test project is much more precise than the English that goes into a bug tracking system. This workflow can easily be extended for Enhancement Requests as well as Bug Reporting. Exploratory Testing Exploratory testing comes in when you aren’t sure how the system will behave in a new scenario. The scenario wasn’t planned for in the initial system requirements and there isn’t an existing test for it. By definition the system behaviour is “undefined”. So write a new unit test to define that behaviour. Add assertions to the tests to confirm your assumptions. The new test becomes part of the living system specification that is kept up to date with the test suite. Examples This workflow is especially good when developing APIs. When you are finally done your production API then comes the job of writing documentation on how to consume the API. Good documentation will also include code examples. Don’t let these code examples merely exist in some accompanying manual; implement them in a test suite. Example tests and documentation do not have to be created after the production API is complete. It is best to write the example code (tests) as you go just before the production code. Smoke Tests Every system has a typical use case. This represents the basic, core functionality of the system. If this fails after an upgrade the end users will be hosed and they will be scratching their heads as to how it could be possible that an update got released with this core functionality broken. The tests for this core functionality are referred to as “smoke tests”. It is a good idea to have them automated and run with each build in order to avoid extreme embarrassment and angry customers. Coverage Analysis Code coverage analysis is a tool that reports how much of the production code base is exercised by the test suite. In Visual Studio this can be found under the Test main menu item. The tool will report a total number for the code coverage, which can be anywhere between 0 and 100%. Coverage Analysis shouldn’t be used strictly for numbers reporting. Companies shouldn’t set minimum coverage targets that mandate that all projects must have at least 80% or 100% test coverage. These arbitrary requirements just invite gaming of the coverage analysis, which makes the numbers useless. The analysis tool will break down the coverage by the various classes and methods in projects. Instead of focusing on the total number, drill down into this view and see which classes have high or low coverage. It you are surprised by a low number on a class this is an opportunity to add tests. When drilling through the classes there will be generally two types of reaction to a surprising low test coverage number. The first reaction type is a recognition that there is low hanging fruit to be picked. There may be some classes or methods that aren’t being tested, which could easy be. The other reaction type is “OMG”. This were you find a critical piece of code that isn’t under test. In both cases, go and add the missing tests. Test Refactoring The general theme of this post up to this point has been how to add more and more tests to a test suite. I’ll step back from that a bit and remind that every line of code is a liability. Each line of code has to be read and maintained, which costs money. This is true regardless whether the code is production code or test code. Remember that the primary goal of the test suite is that it be easy to read so that people can easily determine the specifications of the system. Make sure that adding more and more tests doesn’t interfere with this primary goal. Perform code reviews on the test suite as often as on production code. Hold the test code up to the same high readability standards as the production code. If the tests are hard to read then change them. Look to remove duplication. Duplicate setup code between two or more test methods that can be moved to a shared function. Entire test methods can be removed if it is found that the scenario it tests is covered by other tests. Its OK to delete a test that isn’t pulling its own weight anymore. Remember to only start refactoring when all the test are green. Don’t refactor the tests and the production code at the same time. An automated test suite can be thought of as a double entry book keeping system. The unchanging, passing production code serves as the tests for the test suite while refactoring the tests. As with all refactoring, it is best to fit this into your regular work rather than asking for time later to get it done. Fit this into the standard red-green-refactor cycle. The refactor step no only applies to production code but also the tests, but not at the same time. Perhaps the cycle should be called red-green-refactor production-refactor tests (not quite as catchy). That about covers most of the test-after workflows I can think of. In my next post I’ll get into test-first workflows.

Read the article
Big Data – Role of Cloud Computing in Big Data – Day 11 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the NewSQL. In this article we will understand the role of Cloud in Big Data Story What is Cloud? Cloud is the biggest buzzword around from last few years. Everyone knows about the Cloud and it is extremely well defined online. In this article we will discuss cloud in the context of the Big Data. Cloud computing is a method of providing a shared computing resources to the application which requires dynamic resources. These resources include applications, computing, storage, networking, development and various deployment platforms. The fundamentals of the cloud computing are that it shares pretty much share all the resources and deliver to end users as a service. Examples of the Cloud Computing and Big Data are Google and Amazon.com. Both have fantastic Big Data offering with the help of the cloud. We will discuss this later in this blog post. There are two different Cloud Deployment Models: 1) The Public Cloud and 2) The Private Cloud Public Cloud Public Cloud is the cloud infrastructure build by commercial providers (Amazon, Rackspace etc.) creates a highly scalable data center that hides the complex infrastructure from the consumer and provides various services. Private Cloud Private Cloud is the cloud infrastructure build by a single organization where they are managing highly scalable data center internally. Here is the quick comparison between Public Cloud and Private Cloud from Wikipedia: Public Cloud Private Cloud Initial cost Typically zero Typically high Running cost Unpredictable Unpredictable Customization Impossible Possible Privacy No (Host has access to the data Yes Single sign-on Impossible Possible Scaling up Easy while within defined limits Laborious but no limits Hybrid Cloud Hybrid Cloud is the cloud infrastructure build with the composition of two or more clouds like public and private cloud. Hybrid cloud gives best of the both the world as it combines multiple cloud deployment models together. Cloud and Big Data – Common Characteristics There are many characteristics of the Cloud Architecture and Cloud Computing which are also essentially important for Big Data as well. They highly overlap and at many places it just makes sense to use the power of both the architecture and build a highly scalable framework. Here is the list of all the characteristics of cloud computing important in Big Data Scalability Elasticity Ad-hoc Resource Pooling Low Cost to Setup Infastructure Pay on Use or Pay as you Go Highly Available Leading Big Data Cloud Providers There are many players in Big Data Cloud but we will list a few of the known players in this list. Amazon Amazon is arguably the most popular Infrastructure as a Service (IaaS) provider. The history of how Amazon started in this business is very interesting. They started out with a massive infrastructure to support their own business. Gradually they figured out that their own resources are underutilized most of the time. They decided to get the maximum out of the resources they have and hence they launched their Amazon Elastic Compute Cloud (Amazon EC2) service in 2006. Their products have evolved a lot recently and now it is one of their primary business besides their retail selling. Amazon also offers Big Data services understand Amazon Web Services. Here is the list of the included services: Amazon Elastic MapReduce – It processes very high volumes of data Amazon DynammoDB – It is fully managed NoSQL (Not Only SQL) database service Amazon Simple Storage Services (S3) – A web-scale service designed to store and accommodate any amount of data Amazon High Performance Computing – It provides low-tenancy tuned high performance computing cluster Amazon RedShift – It is petabyte scale data warehousing service Google Though Google is known for Search Engine, we all know that it is much more than that. Google Compute Engine – It offers secure, flexible computing from energy efficient data centers Google Big Query – It allows SQL-like queries to run against large datasets Google Prediction API – It is a cloud based machine learning tool Other Players Besides Amazon and Google we also have other players in the Big Data market as well. Microsoft is also attempting Big Data with the Cloud with Microsoft Azure. Additionally Rackspace and NASA together have initiated OpenStack. The goal of Openstack is to provide a massively scaled, multitenant cloud that can run on any hardware. Thing to Watch The cloud based solutions provides a great integration with the Big Data’s story as well it is very economical to implement as well. However, there are few things one should be very careful when deploying Big Data on cloud solutions. Here is a list of a few things to watch: Data Integrity Initial Cost Recurring Cost Performance Data Access Security Location Compliance Every company have different approaches to Big Data and have different rules and regulations. Based on various factors, one can implement their own custom Big Data solution on a cloud. Tomorrow In tomorrow’s blog post we will discuss about various Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Announcing the New Windows Azure Web Sites Shared Scaling Tier

- by Clint Edmonson

Windows Azure Web Sites has added a new pricing tier that will solve the #1 blocker for the web development community. The shared tier now supports custom domain names mapped to shared-instance web sites. This post will outline the plan changes and elaborate on how the new pricing model makes Windows Azure Web Sites an even richer option for web development shops of all sizes. Free Shared Reserved # of Sites 10 100 100 Egress 165MB/Day 5GB/Month Included 5GB/Month Included Storage 1GB 1GB 10GB Throttling CPU/Memory/Egress CPU/Memory Unlimited Price Free $.02/hr per site, per instance $.08/hr per core Setting the Stage In June, we released the first public preview of Windows Azure Web Sites, which gave web developers a great platform on which to get web sites running using their web development framework of choice. PHP, Node.js, classic ASP, and ASP.NET developers can all utilize the Windows Azure platform to create and launch their web sites. Likewise, these developers have a series of data storage options using Windows Azure SQL Databases, MySQL, or Windows Azure Storage. The Windows Azure Web Sites free offer enabled startups to get their site up and running on Windows Azure with a minimal investment, and with multiple deployment and continuous integration features such as Git, Team Foundation Services, FTP, and Web Deploy. The response to the Windows Azure Web Sites offer has been overwhelmingly positive. Since the addition of the service on June 12th, tens of thousands of web sites have been deployed to Windows Azure and the volume of adoption is increasing every week. Preview Feedback In spite of the growth and success of the product, the community has had questions about features lacking in the free preview offer. The main question web developers asked regarding Windows Azure Web Sites relates to the lack of the free offer’s support for domain name mapping. During the preview launch period, customer feedback made it obvious that the lack of domain name mapping support was an area of concern. We’re happy to announce that this #1 request has been delivered as a feature of the new shared plan. New Shared Tier Portal Features In the screen shot below, the “Scale” tab in the portal shows the new tiers – Free, Shared, and Reserved – and gives the user the ability to quickly move any of their free web sites into the shared tier. With a single mouse-click, the user can move their site into the shared tier. Once a site has been moved into the shared tier, a new Manage Domains button appears in the bottom action bar of the Windows Azure Portal giving site owners the ability to manage their domain names for a shared site. This button brings up the domain-management dialog, which can be used to enter in a specific domain name that will be mapped to the Windows Azure Web Site. Shared Tier Benefits Startups and large web agencies will both benefit from this plan change. Here are a few examples of scenarios which fit the new pricing model: Startups no longer have to select the reserved plan to map domain names to their sites. Instead, they can use the free option to develop their sites and choose on a site-by-site basis which sites they elect to move into the shared plan, paying only for the sites that are finished and ready to be domain-mapped Agencies who manage dozens of sites will realize a lower cost of ownership over the long term by moving their sites into reserved mode. Once multi-site companies reach a certain price point in the shared tier, it is much more cost-effective to move sites to a reserved tier. Long-term, it’s easy to see how the new Windows Azure Web Sites shared pricing tier makes Windows Azure Web Sites it a great choice for both startups and agency customers, as it enables rapid growth and upgrades while keeping the cost to a minimum. Large agencies will be able to have all of their sites in their own instances, and startups will have the capability to scale up to multiple-shared instances for minimal cost and eventually move to reserved instances without worrying about the need to incur continually additional costs. Customers can feel confident they have the power of the Microsoft Windows Azure brand and our world-class support, at prices competitive in the market. Plus, in addition to realizing the cost savings, they’ll have the whole family of Windows Azure features available. Continuous Deployment from GitHub and CodePlex Along with this new announcement are two other exciting new features. I’m proud to announce that web developers can now publish their web sites directly from CodePlex or GitHub.com repositories. Once connections are established between these services and your web sites, Windows Azure will automatically be notified every time a check-in occurs. This will then trigger Windows Azure to pull the source and compile/deploy the new version of your app to your web site automatically. Walk-through videos on how to perform these functions are below: Publishing to an Azure Web Site from CodePlex Publishing to an Azure Web Site from GitHub.com These changes, as well as the enhancements to the reserved plan model, make Windows Azure Web Sites a truly competitive hosting option. It’s never been easier or cheaper for a web developer to get up and running. Check out the free Windows Azure web site offering and see for yourself. Stay tuned to my twitter feed for Windows Azure announcements, updates, and links: @clinted

Read the article
How many developers before continuous integration becomes effective for us?

- by Carnotaurus

There is an overhead associated with continuous integration, e.g., set up, re-training, awareness activities, stoppage to fix "bugs" that turn out to be data issues, enforced separation of concerns programming styles, etc. At what point does continuous integration pay for itself? EDIT: These were my findings The set-up was CruiseControl.Net with Nant, reading from VSS or TFS. Here are a few reasons for failure, which have nothing to do with the setup: Cost of investigation: The time spent investigating whether a red light is due a genuine logical inconsistency in the code, data quality, or another source such as an infrastructure problem (e.g., a network issue, a timeout reading from source control, third party server is down, etc., etc.) Political costs over infrastructure: I considered performing an "infrastructure" check for each method in the test run. I had no solution to the timeout except to replace the build server. Red tape got in the way and there was no server replacement. Cost of fixing unit tests: A red light due to a data quality issue could be an indicator of a badly written unit test. So, data dependent unit tests were re-written to reduce the likelihood of a red light due to bad data. In many cases, necessary data was inserted into the test environment to be able to accurately run its unit tests. It makes sense to say that by making the data more robust then the test becomes more robust if it is dependent on this data. Of course, this worked well! Cost of coverage, i.e., writing unit tests for already existing code: There was the problem of unit test coverage. There were thousands of methods that had no unit tests. So, a sizeable amount of man days would be needed to create those. As this would be too difficult to provide a business case, it was decided that unit tests would be used for any new public method going forward. Those that did not have a unit test were termed 'potentially infra red'. An intestesting point here is that static methods were a moot point in how it would be possible to uniquely determine how a specific static method had failed. Cost of bespoke releases: Nant scripts only go so far. They are not that useful for, say, CMS dependent builds for EPiServer, CMS, or any UI oriented database deployment. These are the types of issues that occured on the build server for hourly test runs and overnight QA builds. I entertain that these to be unnecessary as a build master can perform these tasks manually at the time of release, esp., with a one man band and a small build. So, single step builds have not justified use of CI in my experience. What about the more complex, multistep builds? These can be a pain to build, especially without a Nant script. So, even having created one, these were no more successful. The costs of fixing the red light issues outweighed the benefits. Eventually, developers lost interest and questioned the validity of the red light. Having given it a fair try, I believe that CI is expensive and there is a lot of working around the edges instead of just getting the job done. It's more cost effective to employ experienced developers who do not make a mess of large projects than introduce and maintain an alarm system. This is the case even if those developers leave. It doesn't matter if a good developer leaves because processes that he follows would ensure that he writes requirement specs, design specs, sticks to the coding guidelines, and comments his code so that it is readable. All this is reviewed. If this is not happening then his team leader is not doing his job, which should be picked up by his manager and so on. For CI to work, it is not enough to just write unit tests, attempt to maintain full coverage, and ensure a working infrastructure for sizable systems. The bottom line: One might question whether fixing as many bugs before release is even desirable from a business prespective. CI involves a lot of work to capture a handful of bugs that the customer could identify in UAT or the company could get paid for fixing as part of a client service agreement when the warranty period expires anyway.

Read the article
Most Innovative IDM Projects: Awards at OpenWorld

- by Tanu Sood

On Tuesday at Oracle OpenWorld 2012, Oracle recognized the winners of Innovation Awards 2012 at a ceremony presided over by Hasan Rizvi, Executive Vice President at Oracle. Oracle Fusion Middleware Innovation Awards recognize customers for achieving significant business value through innovative uses of Oracle Fusion Middleware offerings. Winners are selected based on the uniqueness of their business case, business benefits, level of impact relative to the size of the organization, complexity and magnitude of implementation, and the originality of architecture. This year’s Award honors customers for their cutting-edge solutions driving business innovation and IT modernization using Oracle Fusion Middleware. The program has grown over the past 6 years, receiving a record number of nominations from customers around the globe. The winners were selected by a panel of judges that ranked each nomination across multiple different scoring categories. Congratulations to both Avea and ETS for winning this year’s Innovation Award for Identity Management. Identity Management Innovation Award 2012 Winner – Avea Company: Founded in 2004, AveA is the sole GSM 1800 mobile operator of Turkey and has reached a nationwide customer base of 12.8 million as of the end of 2011 Region: Turkey (EMEA) Products: Oracle Identity Manager, Oracle Identity Analytics, Oracle Access Management Suite Business Drivers: · To manage the agility and scale required for GSM Operations and enable call center efficiency by enabling agents to change their identity profiles (accounts and entitlements) rapidly based on call load. · Enhance user productivity and call center efficiency with self service password resets · Enforce compliance and audit reporting · Seamless identity management between AveA and parent company Turk Telecom Innovation and Results: · One of the first Sun2Oracle identity management migrations designed for high performance provisioning and trusted reconciliation built with connectors developed on the ICF architecture that provides custom user interfaces for dynamic and rapid management of roles and entitlements along with entitlement level attestation using closed loop remediation between Oracle Identity Manager and Oracle Identity Analytics. · Dramatic reduction in identity administration and call center password reset tasks leading to 20% reduction in administration costs and 95% reduction in password related calls. · Enhanced user productivity by up to 25% to date · Enforced enterprise security and reduced risk · Cost-effective compliance management · Looking to seamlessly integrate with parent and sister companies’ infrastructure securely. Identity Management Innovation Award 2012 Winner – Education Testing Service (ETS) See last year's winners here --Company: ETS is a private nonprofit organization devoted to educational measurement and research, primarily through testing. Region: U.S.A (North America) Products: Oracle Access Manager, Oracle Identity Federation, Oracle Identity Manager Business Drivers: ETS develops and administers more than 50 million achievement and admissions tests each year in more than 180 countries, at more than 9,000 locations worldwide. As the business becomes more globally based, having a robust solution to security and user management issues becomes paramount. The organizations was looking for: · Simplified user experience for over 3000 company users and more than 6 million dynamic student and staff population · Infrastructure and administration cost reduction · Managing security risk by controlling 3rd party access to ETS systems · Enforce compliance and manage audit reporting · Automate on-boarding and decommissioning of user account to improve security, reduce administration costs and enhance user productivity · Improve user experience with simplified sign-on and user self service Innovation and Results: 1. Manage Risk · Centralized system to control user access · Provided secure way of accessing service providers' application using federated SSO. · Provides reporting capability for auditing, governance and compliance. 2. Improve efficiency · Real-Time provisioning to target systems · Centralized provisioning system for user management and access controls. · Enabling user self services. 3. Reduce cost · Re-using common shared services for provisioning, SSO, Access by application reducing development cost and time. · Reducing infrastructure and maintenance cost by decommissioning legacy/redundant IDM services. · Reducing time and effort to implement security functionality in business applications (“onboard” instead of new development). ETS was able to fold in new and evolving requirement in addition to the initial stated goals realizing quick ROI and successfully meeting business objectives. Congratulations to the winners once again. We will be sure to bring you more from these Innovation Award winners over the next few months.

Read the article
Load and Web Performance Testing using Visual Studio Ultimate 2010-Part 3

- by Tarun Arora

Welcome back once again, in Part 1 of Load and Web Performance Testing using Visual Studio 2010 I talked about why Performance Testing the application is important, the test tools available in Visual Studio Ultimate 2010 and various test rig topologies, in Part 2 of Load and Web Performance Testing using Visual Studio 2010 I discussed the details of web performance & load tests as well as why it’s important to follow a goal based pattern while performance testing your application. In part 3 I’ll be discussing Test Result Analysis, Test Result Drill through, Test Report Generation, Test Run Comparison, Asp.net Profiler and some closing thoughts. Test Results – I see some creepy worms! In Part 2 we put together a web performance test and a load test, lets run the test to see load test to see how the Web site responds to the load simulation. While the load test is running you will be able to see close to real time analysis in the Load Test Analyser window. You can use the Load Test Analyser to conduct load test analysis in three ways: Monitor a running load test - A condensed set of the performance counter data is maintained in memory. To prevent the results memory requirements from growing unbounded, up to 200 samples for each performance counter are maintained. This includes 100 evenly spaced samples that span the current elapsed time of the run and the most recent 100 samples. After the load test run is completed - The test controller spools all collected performance counter data to a database while the test is running. Additional data, such as timing details and error details, is loaded into the database when the test completes. The performance data for a completed test is loaded from the database and analysed by the Load Test Analyser. Below you can see a screen shot of the summary view, this provides key results in a format that is compact and easy to read. You can also print the load test summary, this is generated after the test has completed or been stopped. Analyse the load test results of a previously run load test – We’ll see this in the section where i discuss comparison between two test runs. The performance counters can be plotted on the graphs. You also have the option to highlight a selected part of the test and view details, drill down to the user activity chart where you can hover over to see more details of the test run. Generate Report => Test Run Comparisons The level of reports you can generate using the Load Test Analyser is astonishing. You have the option to create excel reports and conduct side by side analysis of two test results or to track trend analysis. The tools also allows you to export the graph data either to MS Excel or to a CSV file. You can view the ASP.NET profiler report to conduct further analysis as well. View Data and Diagnostic Attachments opens the Choose Diagnostic Data Adapter Attachment dialog box to select an adapter to analyse the result type. For example, you can select an IntelliTrace adapter, click OK and open the IntelliTrace summary for the test agent that was used in the load test. Compare results This creates a set of reports that compares the data from two load test results using tables and bar charts. I have taken these screen shots from the MSDN documentation, I would highly recommend exploring the wealth of knowledge available on MSDN. Leaving Thoughts While load testing the application with an excessive load for a longer duration of time, i managed to bring the IIS to its knees by piling up a huge queue of requests waiting to be processed. This clearly means that the IIS had run out of threads as all the threads were busy processing existing request, one easy way of fixing this is by increasing the default number of allocated threads, but this might escalate the problem. The better suggestion is to try and drill down to the actual root cause of the problem. When ever the garbage collection runs it stops processing any pages so all requests that come in during that period are queued up, but realistically the garbage collection completes in fraction of a a second. To understand this better lets look at the .net heap, it is divided into large heap and small heap, anything greater than 85kB in size will be allocated to the Large object heap, the Large object heap is non compacting and remember large objects are expensive to move around, so if you are allocating something in the large object heap, make sure that you really need it! The small object heap on the other hand is divided into generations, so all objects that are supposed to be short-lived are suppose to live in Gen-0 and the long living objects eventually move to Gen-2 as garbage collection goes through. As you can see in the picture below all < 85 KB size objects are first assigned to Gen-0, when Gen-0 fills up and a new object comes in and finds Gen-0 full, the garbage collection process is started, the process checks for all the dead objects and assigns them as the valid candidate for deletion to free up memory and promotes all the remaining objects in Gen-0 to Gen-1. So in the future when ever you clean up Gen-1 you have to clean up Gen-0 as well. When you fill up Gen – 0 again, all of Gen – 1 dead objects are drenched and rest are moved to Gen-2 and Gen-0 objects are moved to Gen-1 to free up Gen-0, but by this time your Garbage collection process has started to take much more time than it usually takes. Now as I mentioned earlier when garbage collection is being run all page requests that come in during that period are queued up. Does this explain why possibly page requests are getting queued up, apart from this it could also be the case that you are waiting for a long running database process to complete. Lets explore the heap a bit more… What is really a case of crisis is when the objects are living long enough to make it to Gen-2 and then dying, this is definitely a high cost operation. But sometimes you need objects in memory, for example when you cache data you hold on to the objects because you need to use them right across the user session, which is acceptable. But if you wanted to see what extreme caching can do to your server then write a simple application that chucks in a lot of data in cache, run a load test over it for about 10-15 minutes, forcing a lot of data in memory causing the heap to run out of memory. If you get to such a state where you start running out of memory the IIS as a mode of recovery restarts the worker process. It is great way to free up all your memory in the heap but this would clear the cache. The problem with this is if the customer had 10 items in their shopping basket and that data was stored in the application cache, the user basket will now be empty forcing them either to get frustrated and go to a competitor website or if the customer is really patient, give it another try! How can you address this, well two ways of addressing this; 1. Workaround – A x86 bit processor only allows a maximum of 4GB of RAM, this means the machine effectively has around 3.4 GB of RAM available, the OS needs about 1.5 GB of RAM to run efficiently, the IIS and .net framework also need their share of memory, leaving you a heap of around 800 MB to play with. Because Team builds by default build your application in ‘Compile as any mode’ it means the application is build such that it will run in x86 bit mode if run on a x86 bit processor and run in a x64 bit mode if run on a x64 but processor. The problem with this is not all applications are really x64 bit compatible specially if you are using com objects or external libraries. So, as a quick win if you compiled your application in x86 bit mode by changing the compile as any selection to compile as x86 in the team build, you will be able to run your application on a x64 bit machine in x86 bit mode (WOW – By running Windows on Windows) and what that means is, you could use 8GB+ worth of RAM, if you take away everything else your application will roughly get a heap size of at least 4 GB to play with, which is immense. If you need a heap size of more than 4 GB you have either build a software for NASA or there is something fundamentally wrong in your application. 2. Solution – Now that you have put a workaround in place the IIS will not restart the worker process that regularly, which means you can take a breather and start working to get to the root cause of this memory leak. But this begs a question “How do I Identify possible memory leaks in my application?” Well i won’t say that there is one single tool that can tell you where the memory leak is, but trust me, ‘Performance Profiling’ is a great start point, it definitely gets you started in the right direction, let’s have a look at how. Performance Wizard - Start the Performance Wizard and select Instrumentation, this lets you measure function call counts and timings. Before running the performance session right click the performance session settings and chose properties from the context menu to bring up the Performance session properties page and as shown in the screen shot below, check the check boxes in the group ‘.NET memory profiling collection’ namely ‘Collect .NET object allocation information’ and ‘Also collect the .NET Object lifetime information’. Now if you fire off the profiling session on your pages you will notice that the results allows you to view ‘Object Lifetime’ which shows you the number of objects that made it to Gen-0, Gen-1, Gen-2, Large heap, etc. Another great feature about the profile is that if your application has > 5% cases where objects die right after making to the Gen-2 storage a threshold alert is generated to alert you. Since you have the option to also view the most expensive methods and by capturing the IntelliTrace data you can drill in to narrow down to the line of code that is the root cause of the problem. Well now that we have seen how crucial memory management is and how easy Visual Studio Ultimate 2010 makes it for us to identify and reproduce the problem with the best of breed tools in the product. Caching One of the main ways to improve performance is Caching. Which basically means you tell the web server that instead of going to the database for each request you keep the data in the webserver and when the user asks for it you serve it from the webserver itself. BUT that can have consequences! Let’s look at some code, trust me caching code is not very intuitive, I define a cache key for almost all searches made through the common search page and cache the results. The approach works fine, first time i get the data from the database and second time data is served from the cache, significant performance improvement, EXCEPT when two users try to do the same operation and run into each other. But it is easy to handle this by adding the lock as you can see in the snippet below. So, as long as a user comes in and finds that the cache is empty, the user locks and starts to get the cache no more concurrency issues. But lets say you are processing 10 requests per second, by the time i have locked the operation to get the results from the database, 9 other users came in and found that the cache key is null so after i have come out and populated the cache they will still go in to get the results again. The application will still be faster because the next set of 10 users and so on would continue to get data from the cache. BUT if we added another null check after locking to build the cache and before actual call to the db then the 9 users who follow me would not make the extra trip to the database at all and that would really increase the performance, but didn’t i say that the code won’t be very intuitive, may be you should leave a comment you don’t want another developer to come in and think what a fresher why is he checking for the cache key null twice !!! The downside of caching is, you are storing the data outside of the database and the data could be wrong because the updates applied to the database would make the data cached at the web server out of sync. So, how do you invalidate the cache? Well if you only had one way of updating the data lets say only one entry point to the data update you can write some logic to say that every time new data is entered set the cache object to null. But this approach will not work as soon as you have several ways of feeding data to the system or your system is scaled out across a farm of web servers. The perfect solution to this is Micro Caching which means you cache the query for a set time duration and invalidate the cache after that set duration. The advantage is every time the user queries for that data with in the time span for which you have cached the results there are no calls made to the database and the data is served right from the server which makes the response immensely quick. Now figuring out the appropriate time span for which you micro cache the query results really depends on the application. Lets say your website gets 10 requests per second, if you retain the cache results for even 1 minute you will have immense performance gains. You would reduce 90% hits to the database for searching. Ever wondered why when you go to e-bookers.com or xpedia.com or yatra.com to book a flight and you click on the book button because the fare seems too exciting and you get an error message telling you that the fare is not valid any more. Yes, exactly => That is a cache failure! These travel sites or price compare engines are not going to hit the database every time you hit the compare button instead the results will be served from the cache, because the query results are micro cached, its a perfect trade-off, by micro caching the results the site gains 100% performance benefits but every once in a while annoys a customer because the fare has expired. But the trade off works in the favour of these sites as they are still able to process up to 30+ page requests per second which means cater to the site traffic by may be losing 1 customer every once in a while to a competitor who is also using a similar caching technique what are the odds that the user will not come back to their site sooner or later? Recap Resources Below are some Key resource you might like to review. I would highly recommend the documentation, walkthroughs and videos available on MSDN. You can always make use of Fiddler to debug Web Performance Tests. Some community test extensions and plug ins available on Codeplex might also be of interest to you. The Road Ahead Thank you for taking the time out and reading this blog post, you may also want to read Part I and Part II if you haven’t so far. If you enjoyed the post, remember to subscribe to http://feeds.feedburner.com/TarunArora. Questions/Feedback/Suggestions, etc please leave a comment. Next ‘Load Testing in the cloud’, I’ll be working on exploring the possibilities of running Test controller/Agents in the Cloud. See you on the other side! Thank You! Share this post : CodeProject

Read the article
CodePlex Daily Summary for Tuesday, November 01, 2011

CodePlex Daily Summary for Tuesday, November 01, 2011Popular ReleasesAgFx Windows Phone Application Data Caching Framework: AgFx November Update: This update is primarily a bug fix release, which provides a number of stability and performance enhancements to AgFx. Available as a NuGet package here. Release Notes Fix IsoStore exceptions during shutdown (Changelist 81729) Thread safety and sequencings issues (Changelists 78682,79252, 80707) Fix landscape layout issue in auth sample Fix async keyword collision Supporting overlapping/multiple notifications for Load requestsiTuner - The iTunes Companion: iTuner 1.4.4322: Added German (unverified, apologies if incorrect) Properly source invariant resources with correct resIDs Replaced obsolete lyric providers with working providers Fix Pseudolater to correctly morph every third char Fix null reference in CatalogBaseDevpad: 4.6: Whats new for Devpad 4.6: New Recent Files New Run in Safari Minor Bug Fix's, improvements and speed upsWindows Workflow Foundation on Codeplex: Microsoft.Activities v1.8.8: Microsoft.Activities Overview How do I install Microsoft.Activities? Updates in this release9318Technical Analysis Engine for .NET: Technical Analysis Engine 1.25: What's new in the 1.25 release (2011-11-01): - Added Williams %R indicator - Added Moving Average Envelopes indicatorBF3Rcon.NET: BF3Rcon 3.0: This release is targeted for RCON documentation based on R3. Everything should be beta stable, but it's alpha because I haven't been able to fully test it. When a stable release is ready, a proper changelog will be kept. Important Edit: There's one method that will keep this from working in Mono. GeneratePasswordHash uses void HashAlgorithm.Dispose(), which isn't in Mono. This will have to be changed to Clear() in the next release. If anyone needs a Mono version of this immediately, you can...BoxWorld: BoxWorld_2011.10.30: BoxWorld - 8.0.1110.30 This is the initial release of BoxWorld. I'd recommend downloading the installer as it contains the compiled code and everything all nicely contained. By default, you end up with this directory structure: C:\Program Files\ViperWorks\BoxWorld C:\Program Files\ViperWorks\BoxWorld\Data C:\Program Files\ViperWorks\BoxWorld\Interface C:\Program Files\ViperWorks\BoxWorld\Source In the root you have the compiled EXE files, one for the main release, one for the LITE release ...VidCoder: 1.2.1: Fixed a couple regressions: video encoder was blank in queue and crashes with the High Profile preset when opening the Settings window. Fixed problem with auto-update introduced in 1.2.0. If you have 1.2.0 you will need to update manually to get this.AssaultCube Reloaded: Release 2.3: THE RELEASE YOU'VE ALL BEEN WAITING FOR! IT CAN NOW BE CONSIDERED STABLE Linux has Debian 64-bit precompiled binaries, but you can compile your own as it also contains the source. If you are using Mac or other operating systems, download the Linux package. The server pack is ready for both Windows and Linux, but you might need to compile your own for linux (source included) If you are using Windows and require the source code, download the source package!Koober: Koober - The Ebook Creator 0.2: The official release of Koober as Open source. Koober is a ebook creator for Windows, and Koob Reader is the reader.A Microblog API (SINA weibo.com open API in C#, .Net???????API): AMicroblogAPI v1.0: AMicroBlogAPI is a C# implementation, a strong-typed deep encapsulation, a easy-to-use .net wrapper of SINA microblog API v1.0. App developers no longer need to parse various HTTP responses -- all responses are parsed into strong types; No longer need to construt the request query strings -- just simply gives the values of parameters; No longer need to implement the OAuth -- a single call could logs the user on. In this release (AMicroblogAPI v1.0), all basic data APIs are implemented. For ...patterns & practices: Enterprise Library Contrib: Enterprise Library Contrib - 5.0 (Oct 2011): This release of Enterprise Library Contrib is based on the Microsoft patterns & practices Enterprise Library 5.0 core and contains the following: Common extensionsTypeConfigurationElement<T> - A Polymorphic Configuration Element without having to be part of a PolymorphicConfigurationElementCollection. AnonymousConfigurationElement - A Configuration element that can be uniquely identified without having to define its name explicitly. Data Access Application Block extensionsMySql Provider - ...Network Monitor Open Source Parsers: Network Monitor Parsers 3.4.2748: The Network Monitor Parsers packages contain parsers for more than 400 network protocols, including RFC based public protocols and protocols for Microsoft products defined in the Microsoft Open Specifications for Windows and SQL Server. NetworkMonitor_Parsers.msi is the base parser package which defines parsers for commonly used public protocols and protocols for Microsoft Windows. In this release, NetowrkMonitor_Parsers.msi continues to improve quality and fix bugs. It has included the fo...Duckworth Lewis Professional Edition Calculator: DLcalc 3.0: DLcalc 3.0 can perform Duckworth/Lewis Professional Edition calculations 100% accurately. It also produces over-by-over and ball-by-ball PAR score tables.Media Companion: MC 3.420b Weekly: Ensure .NET 4.0 Full Framework is installed. (Available from http://www.microsoft.com/download/en/details.aspx?id=17718) Ensure the NFO ID fix is applied when transitioning from versions prior to 3.416b. (Details here) Movies Fixed: Fanart and poster scraping issues TV Shows (Re)Added: Rebuild single show Fixed: Issue when shows are moved from original location Ability to handle " for actor nicknames Crash when episode name contains "<" (does not scrape yet) Clears fanart when switch...patterns & practices - Unity: Unity 3.0 for .NET4.5 Preview: The Unity 3.0.1026.0 Preview enables Unity to work on .NET 4.5 with both the WinRT and desktop profiles. The major changes include: Unity projects updated to target .NET 4.5. Dynamic build plans modified to use compiled lambda expressions instead of Reflection.Emit Converting reflection to use the new TypeInfo for reflection. Projects updated to work with the Microsoft Visual Studio 2011 Preview Notes/Known Issues: The Microsoft.Practices.Unity.UnityServiceLocator class cannot be use...Managed Extensibility Framework: MEF 2 Preview 4: Detailed information on this release is available on the BCL team blog.AcDown????? - Anime&Comic Downloader: AcDown????? v3.6: ?? ● AcDown??????????、??????，??????????????????????，???????Acfun、Bilibili、???、???、???、Tucao.cc、SF???、?????80????，???????????、?????????。 ● AcDown???????????????????????????，???，???????????????????。 ● AcDown???????C#??，????.NET Framework 2.0??。?????"Acfun?????"。 ????32??64? Windows XP/Vista/7 ????????????? ??：????????Windows XP???,?????????.NET Framework 2.0???(x86)?.NET Framework 2.0???(x64)，?????"?????????"??? ??????????????，??????????： ??"AcDown?????"????????? ?? v3.6?? ??“????”...MVC Controls Toolkit: Mvc Controls Toolkit 1.5.0: Added: The new Client Blocks feaure of Views A new "move" js method for the TreeViews The NewHtmlCreated js event to the DataGrid Improved the ChoiceList structure that now allows also the selection list of a dropdown to be chosen with a lambda expression Improved the AcceptViewHintAttribute controller filter. Now a client can specify not only the name of a View or Partial View it prefers, but also to receive just the rough data in Json format. Now the the SMinimum and SMaximum par...Free SharePoint Master Pages: Buried Alive (Halloween) Theme: Release Notes *Created for Halloween, you will find theme file, custom css file and images. *Created by Al Roome @AlstarRoome Features: Custom styling for web part Custom background *Screenshot https://s3.amazonaws.com/kkhipple/post/sharepoint-showcase-halloween.pngNew ProjectsarlTH: "Airport Link Thailand" Application for Windows Phone 7.1Audio Tags Editor For Excel: The purpose of the project is to create a tool to edit tags of audio files through the excel application. The program is an Excel add-in.AW Load Tester: AW Load Tester is a simple and easy to use website load tester that very closely mimics a regular users browsing session on your website. Many popular load tester only download HTML. AW Load Tester downloads all the related items on a page simultaneously. AzMan - Be Good Do Right !: The AzMan is a collection of reusable software components designed to assist software developers with common enterprise development challenges. http://bbs.fengshupo.com/BestWayWP7UI: The User interface of best wat wp7 applicationCheckout Subsystem Optimizer: Checkout Subsystem OptimizerDBScripter - Library for scripting SQL Server database objects: This project is library that allows users to script SQL Server database objects. Library uses dynamic management views for extracting data about databases objects. This library could be used in various situations. The most interesting areas are comparing database objects and generation database documentation. For both cases examples have been prepared.Dependency Checker: Dependency Checker is a utility for verifying that a set of prerequisites is present on a machine. The set of dependencies is defined in a configuration file. For more information, see http://dependencychecker.codeplex.com/documentation DibTer: DibTer is a brand new CMS for ASP.NET Mainly features are NHibernate MVC JQuery HTML5Dynamics CRM Entity Based Stress Tool: Entity Based Stress Tool, is command line program that creates configurable server agents in order to stress a defined entity at a defined Message/Stage. This tool allows CRM Application Stressing, Plugin Stressing, and enlaborate an execution report.Edinger: Edinger is a simple text editor.FreeboxHDVideoPlayer: FreeboxHDVideoPlayer makes it easier for free.fr french ISP users to play and manage videos stored on the HDD of the Freebox HD. FreeboxHDVideoPlayer simplifie, pour les utilisateurs de free.fr, la lecture et la gestion des vidéos stockées sur le disque du boitier Freebox HD.guglex - google docs xtraz: my studying project for asp.net mvc 3. now displaying spreadsheet rows in card viewIP-Camera MJPEG HTTP Web-Server Emulator: Turns web-camera into IP-camera with mjpeg streaming and access it over network. Usage of i-Lids (Imagery Library for Intelligent Detection Systems) and other data for surveillance systems development. Sends WMV, folder with images, web-camera data over the wire.LIM: LIM is .net software for archive manage.Luminji.CorWeb: luminji's company web.Map Generator: Map Generator for your Civ/Col/Roguelike games in VB.NET.Microformat Parsers for .NET: Microformat's Parsers for .NET helps you to collect information you run into on the web, that is stored by means of microformats. It's written in C# 3.0.Mini Rx: Mini Rx provides LINQ like queries over events for C# and F# using extension methods, somewhat like the Reactive Extensions (Rx) but in a lightweight open source package with one non-strong named assembly.MVC Music Store by NNT: Demo Music Store Follow tut from Microsoft. Team Explorer VS 2010 .NET 4PHP MySQL Web Site Creator: PHP MySQL Web Site Creator helps you create a basic template for your web site using a UI form. I creates all PHP, CSS, HTML basic pages to connect to MySQL database.PQLRD: Paco KLk!Project Web And Application: Project MobileStore in ASP use ASP.NET MVC 3RoboZip: RoboZip combines MS Robocopy and Zip-functionality. Create a job and RoboZip will zip all files (from a list of filetypes) within a time span in the past. The zip file name can contain date and time values of the time period it covers. It's developed in C#. The zip component is the DotNetZip Library from http://dotnetzip.codeplex.com. The logging is done using Apache log4net from http://logging.apache.org/log4net. The idea here is not to only present the code, but also to document how the de...Scriptster Gearbox: Scriptster Gearbox is a full C# scripting system based on Scriptster (also on CodePlex). It is aimed at Visual Studio developers and power users who need to automate the sorts of things that DOS is generally good at, but which using C# can make more powerful and more familiar.Technical Analysis Engine for .NET: Technical Analysis Engine is a .NET based library for technical analysis. It is fast, free, easy to use and provides functionality for financial market analysis.ValidationDSL: A Fluent validation engine with ease to use and reusable around the multi-tenant applicationsWebFormsUtilities: WebFormsUtilities is a collection of tools that assist in using MVC/MVP functionality to webforms in an unobtrusive manner. This differs from a hybrid MVC/WebForms page in that you can use methods like UpdateModel() or TryValidateModel() in a WebForms Page class.Wen - WPF 3D Navigator: Control the camera and the lights of a WPF 3D application without any code modification.WindStudios: WindStudiosWrite My Expect Statement: Have Rhino Mocks automatically generate your expect() and return() statements for you.zion xtraz: some basic helpful classes for use in every project

Read the article
Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article
Exchange 2007 ignoring Send Connectors (again)

- by gravyface

Wow, I'm at a loss here -- I posted this exact same question a while back and it's doing the exact same thing: my Send Connector I've created for "Microsoft Domains" (hotmail.com cost 1) is being ignored again and routed through the "Default" Send Connector (* cost 10). Last time, I had the same cost for both Send Connectors, but I've tried setting the Default connector to 5, 10, 100, etc. and regardless, all mail gets routed through that connector (which smarthosts through Postini). Besides calling an air strike on Redmond, what else can I do? MS is blocking Postini again, need to get this working permanently.

Read the article
Cheapest way to go for somebody who wants to accept payments, but won't be accepting hundreds of ord

- by blockhead

I have a client who lectures, and wants to sell spots to his lecture online. I would preferably like to set him up with a solution that allows me to collect billing information on his site. My experience with e-commerce is in using solutions like Authorize.net, however this does not seem cost effective since I can't imagine he's making a huge profit off of this. I'm afraid he would lose money in the cost of using Authorize.net (or any payment gateway for the matter). I could use google checkout or paypal express, but this would require me to leave his site (although with google checkout, it looks like, from a glance, that I could just submit to their form from my server, and likely with paypal as well, but I don't know if this is against their TOS). What is the most cost-effective solution for accepting credit card payments in this situation?

Read the article
Linux servers vs Windows IIS sense of usage "free" solutions

- by Rob

I wonder what is the sense of using "free" open source solutions for serious webstie applications? Crawled and read many testing of servers performance and there is one conclusion: IIS seems to be the best choice for high load applicatiom. I mean cost effective. Especially this concers to Nginx PLUS and LiteSpeed Users where subscriptions paid for e.g. LoadBalacer and extra support cost a lot in fact. I'm asking then where it's "free" then or "cheap" in this case? Assuming even little higher cost of dedicated servers with Windows still seems like Windows looks cheaper. At it's basic setup Windows 2012 with IIS offer much more than std LAMP, or other NGINX config.... Maybe am I missing sth ? I mean only general case for someone who did not already started his app. I know exactly that the cheapest solution is the one someone is skilled. Has anyone done already such real costs calculation for example scenarios?

Read the article
how to speed up the code??

- by kaushik

i have very huge code about 600 lines plus. cant post the whole thing here. but a particular code snippet is taking so much time,leading to problems. here i post that part of code please tell me what to do speed up the processing.. please suggest the part which may be the reason and measure to improve them if this small part of code is understandable. using_data={} def join_cost(a , b): global using_data #print a #print b save_a=[] save_b=[] print 1 #for i in range(len(m)): #if str(m[i][0])==str(a): save_a=database_index[a] #for i in range(len(m)): # if str(m[i][0])==str(b): #print 'save_a',save_a #print 'save_b',save_b print 2 save_b=database_index[b] using_data[save_a[0]]=save_a s=str(save_a[1]).replace('phone','text') s=str(s)+'.pm' p=os.path.join("c:/begpython/wavnk/",s) x=open(p , 'r') print 3 for i in range(6): x.readline() k2='a' j=0 o=[] while k2 is not '': k2=x.readline() k2=k2.rstrip('\n') oj=k2.split(' ') o=o+[oj] #print o[j] j=j+1 #print j #print o[2][0] temp=long(1232332) end_time=save_a[4] #print end_time k=(j-1) for i in range(k): diff=float(o[i][0])-float(end_time) if diff<0: diff=diff*(-1) if temp>diff: temp=diff pm_row=i #print pm_row #print temp #print o[pm_row] #pm_row=3 q=[] print 4 l=str(p).replace('.pm','.mcep') z=open(l ,'r') for i in range(pm_row): z.readline() k3=z.readline() k3=k3.rstrip('\n') q=k3.split(' ') #print q print 5 s=str(save_b[1]).replace('phone','text') s=str(s)+'.pm' p=os.path.join("c:/begpython/wavnk/",s) x=open(p , 'r') for i in range(6): x.readline() k2='a' j=0 o=[] while k2 is not '': k2=x.readline() k2=k2.rstrip('\n') oj=k2.split(' ') o=o+[oj] #print o[j] j=j+1 #print j #print o[2][0] temp=long(1232332) strt_time=save_b[3] #print strt_time k=(j-1) for i in range(k): diff=float(o[i][0])-float(strt_time) if diff<0: diff=diff*(-1) if temp>diff: temp=diff pm_row=i #print pm_row #print temp #print o[pm_row] #pm_row=3 w=[] l=str(p).replace('.pm','.mcep') z=open(l ,'r') for i in range(pm_row): z.readline() k3=z.readline() k3=k3.rstrip('\n') w=k3.split(' ') #print w cost=0 for i in range(12): #print q[i] #print w[i] h=float(q[i])-float(w[i]) cost=cost+math.pow(h,2) j_cost=math.sqrt(cost) #print cost return j_cost def target_cost(a , b): a=(b+1)*3 b=(a+1)*2 t_cost=(a+b)*5/2 return t_cost r1='shht:ra_77' r2='grx_18' g=[] nodes=[] nodes=nodes+[[r1]] for i in range(len(y_in_db_format)): g=y_in_db_format[i] #print g #print g[0] g.remove(str(g[0])) nodes=nodes+[g] nodes=nodes+[[r2]] print nodes print "lenght of nodes",len(nodes) lists=[] #lists=lists+[r1] for i in range(len(nodes)): for j in range(len(nodes[i])): lists=lists+[nodes[i][j]] #lists=lists+[r2] print lists distance={} for i in range(len(lists)): if i==0: distance[str(lists[i])]=0 else: distance[str(lists[i])]=long(123231223) #print distance group_dist=[] infinity=long(123232323) for i in range(len(nodes)): distances=[] for j in range(len(nodes[i])): #distances=[] if i==0: distances=distances+[[nodes[i][j], 0]] else: distances=distances+[[nodes[i][j],infinity]] group_dist=group_dist+[distances] #print distances print "group_distances",group_dist #print "check",group_dist[0][0][1] #costs={} #for i in range(len(lists)): #if i==0: # costs[str(lists[i])]=1 #else: # costs[str(lists[i])]=get_selfcost(lists[i]) path=[] for i in range(len(nodes)): mini=[] if i!=(len(nodes)-1): #temp=long(123234324) #Now calculate the cost between the current node and each of its neighbour for k in range(len(nodes[(i+1)])): for j in range(len(nodes[i])): current=nodes[i][j] #print "current_node",current j_distance=join_cost( current , nodes[i+1][k]) #t_distance=target_cost( current , nodes[i+1][k]) t_distance=34 #print distance #print "distance between current and neighbours",distance total_distance=(.5*(float(group_dist[i][j][1])+float(j_distance))+.5*(float(t_distance))) #print "total distance between the intial_nodes and current neighbour",total_distance if int(group_dist[i+1][k][1]) > int(total_distance): group_dist[i+1][k][1]=total_distance #print "updated distance",group_dist[i+1][k][1] a=current #print "the neighbour",nodes[i+1][k],"updated the value",a mini=mini+[[str(nodes[i+1][k]),a]] print mini

Read the article
Open an Application .exe window inside another application

- by jpnavarini

I have an application in WPF running. I would like that, when a button is clicked inside this application, another application opens, with its window maximized. However, I don't want my first application to stop and wait. I want both to be open and running independently. When the button is clicked again, in case the application is minimized, the application is maximized. In case it is not, it is open again. How is it possible using C#? I have tried the following: Process process = Process.GetProcesses().FirstOrDefault(f => f.ProcessName.Contains("Analysis")); ShowWindow((process ?? Process.Start("..\\..\\..\\MS Analysis\\bin\\Debug\\Chemtech.RT.MS.Analysis.exe")).MainWindowHandle.ToInt32(), SW_MAXIMIZE); But the window does not open, even though the process does start.

Read the article
Is there a way of getting a file name and inserting into Matlab script?

- by torr

In a folder, I have both my .m file that contains the script and an imaging .dcm file that needs to be analyzed. Folder structure: Folder1/analysis.m Folder1/meas_dynamic_123.dcm My script (analysis.m) begins as follows: target =''; <== here should go the full path to the file + filename example: /Volumes/Data/Folder1/meas_dynamic_123.m txt = dir(target); // etc So I'm wondering if there is a way of when running analysis.m it will: automatically search the folder it's in, grab the full path + filename of file containing string dynamic in the name, insert its full path + name into target variable continue running the script Does anyone have any pointers on how to achieve this? Using ffpath?

Read the article
Not Playing Nice Together

- by David Douglass

One of the things I’ve noticed is that two industry trends are not playing nice together, those trends being multi-core CPUs and massive hard drives. It’s not a problem if you keep your cores busy with compute intensive work, but for software developers the beauty of multi-core CPUs (along with gobs of RAM and a 64 bit OS) is virtualization. But when you have only one hard drive (who needs another when it holds 2 TB of data?) you wind up with a serious hard drive bottleneck. A solid state drive would definitely help, and might even be a complete solution, but the cost is ridiculous. Two TB of solid state storage will set you back around $7,000! A spinning 2 TB drive is only $150. I see a couple of solutions for this. One is the mainframe concept of near and far storage: put the stuff that will be heavily access on a solid state drive and the rest on a spinning drive. Another solution is multiple spinning drives. Instead of a single 2 TB drive, get four 500 GB drives. In total, the four 500 GB drives will cost about $100 more than the single 2 TB drive. You’ll need to be smart about what drive you place things on so that the load is spread evenly. Another option, for better performance, would be four 10,000 RPM 300 GB drives, but that would cost about $800 more than the singe 2 TB drive and would deliver only 1.2 TB of space. All pricing based on Microcenter as of March 14, 2010.

Read the article
Oracle VM Blade Cluster Reference Configuration

- by Ferhat Hatay

Today we are happy to announce the availability of the Oracle VM blade cluster reference configuration for Sun Blade 6000 modular systems. The new Oracle VM blade cluster reference configuration can help reduce the time to deploy virtual infrastructure by up to 98 percent when compared to multi-vendor configurations. Oracle's virtualization strategy is to simplify the deployment, management, and support of the enterprise stack from application to disk. The Oracle VM blade cluster reference configuration is a single-vendor solution that addresses every layer of the virtualization stack with Oracle hardware and software components. It enables quick and easy deployment of the virtualized infrastructure using components that have been tested together and are all supported together by one vendor — Oracle. All components listed in the reference configuration have been tested together by Oracle, reducing the need for customer testing and the time-consuming and complex effort of designing and deploying a stable configuration. Benefitting from pre-installed Oracle VM Server for x86 software on Oracle’s highly scalable and reliable Sun Blade servers with built-in networking and Oracle’s Sun ZFS Storage Appliance product line, the configuration provides high availability via the blade cluster as well as a documented best practice guide that helps reduce deployment time and cost for customers implementing highly virtualized applications or private cloud Infrastructure as a Service (IaaS) architectures. To further support easier, faster and lower-cost deployments, Oracle Linux, Oracle Solaris and Oracle VM are available for pre-install on select Sun x86 systems, and Oracle VM Templates are available for download for Oracle Applications, Oracle Fusion Middleware, Oracle Database, Oracle Real Application Clusters, and many other Oracle products. Key benefits of the Oracle VM blade cluster reference configuration include: Faster time to value – Begin deploying applications immediately because the optimized software stack is pre-configured for best practices and is ready-to-run on the recommended hardware platforms. Reduced deployment cost and risk – The entire hardware and software stack has been tested and is supported together by Oracle. Elastic scalability – As capacity needs grow, the system can be easily scaled in multiple dimensions with the ability to add compute, storage, and networking resources independently. For more information, see: Oracle white paper: Accelerating deployment of virtualized infrastructures with the Oracle VM blade cluster reference configuration Oracle technical white paper: Best Practices and Guidelines for Deploying the Oracle VM Blade Cluster Reference Configuration

Read the article
Distributed Development Tools -- (Version control and Project Management)

- by Macy Abbey

Hello, I've recently become responsible for choosing which source control and project management software to use for a company that employs me. Currently it uses Jira (project management) and Subversion (version control). I know there are many other options out there -- the ones I know about are all in this article http://mashable.com/2010/07/14/distributed-developer-teams/ . I'm leaning towards recommending they just stay with what they have as it seems workable and any change would have to be worth the cost of switching to say github/basecamp or some other solution. Some details on the team: It's a distributed development shop. Meetings of the whole team in one room are rare. It's currently a very small development team (three developers). The project management software is used by developers and a product manager or two. What are you experiences with version control and project management web applications? Are there any you would recommend and you think are worth the switching cost of time to learn new services / implementing the change? Edit: After educating myself further on the options it appears DVCS offer powerful benefits that may be worth investing in now as opposed to later in the company's lifetime when the switching cost is higher: I'm a Subversion geek, why I should consider or not consider Mercurial or Git or any other DVCS?

Read the article

Search Results

Search found 5666 results on 227 pages for 'cost analysis'.

Page 46/227 | < Previous Page | 42 43 44 45 46 47 48 49 50 51 52 53 | Next Page >

- by KLaker

- by mseika

- by Mandy Ho

- by amacias

- by Shai Sherman

- by kgrote

- by Dennis Kashkin

- by Rob Farley

- by Timothy Klenke

- by Pinal Dave

- by Clint Edmonson

- by Carnotaurus

- by Tanu Sood

- by Tarun Arora

- by user12620111

- by gravyface

- by blockhead

- by Rob

- by kaushik

- by jpnavarini

- by torr

- by David Douglass

- by Ferhat Hatay

- by Macy Abbey

< Previous Page | 42 43 44 45 46 47 48 49 50 51 52 53 | Next Page >