Search Results

Search found 21155 results on 847 pages for 'free space'.

Page 162/847 | < Previous Page | 158 159 160 161 162 163 164 165 166 167 168 169  | Next Page >

  • Advanced TSQL Tuning: Why Internals Knowledge Matters

    - by Paul White
    There is much more to query tuning than reducing logical reads and adding covering nonclustered indexes.  Query tuning is not complete as soon as the query returns results quickly in the development or test environments.  In production, your query will compete for memory, CPU, locks, I/O and other resources on the server.  Today’s entry looks at some tuning considerations that are often overlooked, and shows how deep internals knowledge can help you write better TSQL. As always, we’ll need some example data.  In fact, we are going to use three tables today, each of which is structured like this: Each table has 50,000 rows made up of an INTEGER id column and a padding column containing 3,999 characters in every row.  The only difference between the three tables is in the type of the padding column: the first table uses CHAR(3999), the second uses VARCHAR(MAX), and the third uses the deprecated TEXT type.  A script to create a database with the three tables and load the sample data follows: USE master; GO IF DB_ID('SortTest') IS NOT NULL DROP DATABASE SortTest; GO CREATE DATABASE SortTest COLLATE LATIN1_GENERAL_BIN; GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest', SIZE = 3GB, MAXSIZE = 3GB ); GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest_log', SIZE = 256MB, MAXSIZE = 1GB, FILEGROWTH = 128MB ); GO ALTER DATABASE SortTest SET ALLOW_SNAPSHOT_ISOLATION OFF ; ALTER DATABASE SortTest SET AUTO_CLOSE OFF ; ALTER DATABASE SortTest SET AUTO_CREATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_SHRINK OFF ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS_ASYNC ON ; ALTER DATABASE SortTest SET PARAMETERIZATION SIMPLE ; ALTER DATABASE SortTest SET READ_COMMITTED_SNAPSHOT OFF ; ALTER DATABASE SortTest SET MULTI_USER ; ALTER DATABASE SortTest SET RECOVERY SIMPLE ; USE SortTest; GO CREATE TABLE dbo.TestCHAR ( id INTEGER IDENTITY (1,1) NOT NULL, padding CHAR(3999) NOT NULL,   CONSTRAINT [PK dbo.TestCHAR (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestMAX ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL,   CONSTRAINT [PK dbo.TestMAX (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestTEXT ( id INTEGER IDENTITY (1,1) NOT NULL, padding TEXT NOT NULL,   CONSTRAINT [PK dbo.TestTEXT (id)] PRIMARY KEY CLUSTERED (id), ) ; -- ============= -- Load TestCHAR (about 3s) -- ============= INSERT INTO dbo.TestCHAR WITH (TABLOCKX) ( padding ) SELECT padding = REPLICATE(CHAR(65 + (Data.n % 26)), 3999) FROM ( SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1 FROM master.sys.columns C1, master.sys.columns C2, master.sys.columns C3 ORDER BY n ASC ) AS Data ORDER BY Data.n ASC ; -- ============ -- Load TestMAX (about 3s) -- ============ INSERT INTO dbo.TestMAX WITH (TABLOCKX) ( padding ) SELECT CONVERT(VARCHAR(MAX), padding) FROM dbo.TestCHAR ORDER BY id ; -- ============= -- Load TestTEXT (about 5s) -- ============= INSERT INTO dbo.TestTEXT WITH (TABLOCKX) ( padding ) SELECT CONVERT(TEXT, padding) FROM dbo.TestCHAR ORDER BY id ; -- ========== -- Space used -- ========== -- EXECUTE sys.sp_spaceused @objname = 'dbo.TestCHAR'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAX'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestTEXT'; ; CHECKPOINT ; That takes around 15 seconds to run, and shows the space allocated to each table in its output: To illustrate the points I want to make today, the example task we are going to set ourselves is to return a random set of 150 rows from each table.  The basic shape of the test query is the same for each of the three test tables: SELECT TOP (150) T.id, T.padding FROM dbo.Test AS T ORDER BY NEWID() OPTION (MAXDOP 1) ; Test 1 – CHAR(3999) Running the template query shown above using the TestCHAR table as the target, we find that the query takes around 5 seconds to return its results.  This seems slow, considering that the table only has 50,000 rows.  Working on the assumption that generating a GUID for each row is a CPU-intensive operation, we might try enabling parallelism to see if that speeds up the response time.  Running the query again (but without the MAXDOP 1 hint) on a machine with eight logical processors, the query now takes 10 seconds to execute – twice as long as when run serially. Rather than attempting further guesses at the cause of the slowness, let’s go back to serial execution and add some monitoring.  The script below monitors STATISTICS IO output and the amount of tempdb used by the test query.  We will also run a Profiler trace to capture any warnings generated during query execution. DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TC.id, TC.padding FROM dbo.TestCHAR AS TC ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; Let’s take a closer look at the statistics and query plan generated from this: Following the flow of the data from right to left, we see the expected 50,000 rows emerging from the Clustered Index Scan, with a total estimated size of around 191MB.  The Compute Scalar adds a column containing a random GUID (generated from the NEWID() function call) for each row.  With this extra column in place, the size of the data arriving at the Sort operator is estimated to be 192MB. Sort is a blocking operator – it has to examine all of the rows on its input before it can produce its first row of output (the last row received might sort first).  This characteristic means that Sort requires a memory grant – memory allocated for the query’s use by SQL Server just before execution starts.  In this case, the Sort is the only memory-consuming operator in the plan, so it has access to the full 243MB (248,696KB) of memory reserved by SQL Server for this query execution. Notice that the memory grant is significantly larger than the expected size of the data to be sorted.  SQL Server uses a number of techniques to speed up sorting, some of which sacrifice size for comparison speed.  Sorts typically require a very large number of comparisons, so this is usually a very effective optimization.  One of the drawbacks is that it is not possible to exactly predict the sort space needed, as it depends on the data itself.  SQL Server takes an educated guess based on data types, sizes, and the number of rows expected, but the algorithm is not perfect. In spite of the large memory grant, the Profiler trace shows a Sort Warning event (indicating that the sort ran out of memory), and the tempdb usage monitor shows that 195MB of tempdb space was used – all of that for system use.  The 195MB represents physical write activity on tempdb, because SQL Server strictly enforces memory grants – a query cannot ‘cheat’ and effectively gain extra memory by spilling to tempdb pages that reside in memory.  Anyway, the key point here is that it takes a while to write 195MB to disk, and this is the main reason that the query takes 5 seconds overall. If you are wondering why using parallelism made the problem worse, consider that eight threads of execution result in eight concurrent partial sorts, each receiving one eighth of the memory grant.  The eight sorts all spilled to tempdb, resulting in inefficiencies as the spilled sorts competed for disk resources.  More importantly, there are specific problems at the point where the eight partial results are combined, but I’ll cover that in a future post. CHAR(3999) Performance Summary: 5 seconds elapsed time 243MB memory grant 195MB tempdb usage 192MB estimated sort set 25,043 logical reads Sort Warning Test 2 – VARCHAR(MAX) We’ll now run exactly the same test (with the additional monitoring) on the table using a VARCHAR(MAX) padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TM.id, TM.padding FROM dbo.TestMAX AS TM ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query takes around 8 seconds to complete (3 seconds longer than Test 1).  Notice that the estimated row and data sizes are very slightly larger, and the overall memory grant has also increased very slightly to 245MB.  The most marked difference is in the amount of tempdb space used – this query wrote almost 391MB of sort run data to the physical tempdb file.  Don’t draw any general conclusions about VARCHAR(MAX) versus CHAR from this – I chose the length of the data specifically to expose this edge case.  In most cases, VARCHAR(MAX) performs very similarly to CHAR – I just wanted to make test 2 a bit more exciting. MAX Performance Summary: 8 seconds elapsed time 245MB memory grant 391MB tempdb usage 193MB estimated sort set 25,043 logical reads Sort warning Test 3 – TEXT The same test again, but using the deprecated TEXT data type for the padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TT.id, TT.padding FROM dbo.TestTEXT AS TT ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query runs in 500ms.  If you look at the metrics we have been checking so far, it’s not hard to understand why: TEXT Performance Summary: 0.5 seconds elapsed time 9MB memory grant 5MB tempdb usage 5MB estimated sort set 207 logical reads 596 LOB logical reads Sort warning SQL Server’s memory grant algorithm still underestimates the memory needed to perform the sorting operation, but the size of the data to sort is so much smaller (5MB versus 193MB previously) that the spilled sort doesn’t matter very much.  Why is the data size so much smaller?  The query still produces the correct results – including the large amount of data held in the padding column – so what magic is being performed here? TEXT versus MAX Storage The answer lies in how columns of the TEXT data type are stored.  By default, TEXT data is stored off-row in separate LOB pages – which explains why this is the first query we have seen that records LOB logical reads in its STATISTICS IO output.  You may recall from my last post that LOB data leaves an in-row pointer to the separate storage structure holding the LOB data. SQL Server can see that the full LOB value is not required by the query plan until results are returned, so instead of passing the full LOB value down the plan from the Clustered Index Scan, it passes the small in-row structure instead.  SQL Server estimates that each row coming from the scan will be 79 bytes long – 11 bytes for row overhead, 4 bytes for the integer id column, and 64 bytes for the LOB pointer (in fact the pointer is rather smaller – usually 16 bytes – but the details of that don’t really matter right now). OK, so this query is much more efficient because it is sorting a very much smaller data set – SQL Server delays retrieving the LOB data itself until after the Sort starts producing its 150 rows.  The question that normally arises at this point is: Why doesn’t SQL Server use the same trick when the padding column is defined as VARCHAR(MAX)? The answer is connected with the fact that if the actual size of the VARCHAR(MAX) data is 8000 bytes or less, it is usually stored in-row in exactly the same way as for a VARCHAR(8000) column – MAX data only moves off-row into LOB storage when it exceeds 8000 bytes.  The default behaviour of the TEXT type is to be stored off-row by default, unless the ‘text in row’ table option is set suitably and there is room on the page.  There is an analogous (but opposite) setting to control the storage of MAX data – the ‘large value types out of row’ table option.  By enabling this option for a table, MAX data will be stored off-row (in a LOB structure) instead of in-row.  SQL Server Books Online has good coverage of both options in the topic In Row Data. The MAXOOR Table The essential difference, then, is that MAX defaults to in-row storage, and TEXT defaults to off-row (LOB) storage.  You might be thinking that we could get the same benefits seen for the TEXT data type by storing the VARCHAR(MAX) values off row – so let’s look at that option now.  This script creates a fourth table, with the VARCHAR(MAX) data stored off-row in LOB pages: CREATE TABLE dbo.TestMAXOOR ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL,   CONSTRAINT [PK dbo.TestMAXOOR (id)] PRIMARY KEY CLUSTERED (id), ) ; EXECUTE sys.sp_tableoption @TableNamePattern = N'dbo.TestMAXOOR', @OptionName = 'large value types out of row', @OptionValue = 'true' ; SELECT large_value_types_out_of_row FROM sys.tables WHERE [schema_id] = SCHEMA_ID(N'dbo') AND name = N'TestMAXOOR' ; INSERT INTO dbo.TestMAXOOR WITH (TABLOCKX) ( padding ) SELECT SPACE(0) FROM dbo.TestCHAR ORDER BY id ; UPDATE TM WITH (TABLOCK) SET padding.WRITE (TC.padding, NULL, NULL) FROM dbo.TestMAXOOR AS TM JOIN dbo.TestCHAR AS TC ON TC.id = TM.id ; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAXOOR' ; CHECKPOINT ; Test 4 – MAXOOR We can now re-run our test on the MAXOOR (MAX out of row) table: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) MO.id, MO.padding FROM dbo.TestMAXOOR AS MO ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; TEXT Performance Summary: 0.3 seconds elapsed time 245MB memory grant 0MB tempdb usage 193MB estimated sort set 207 logical reads 446 LOB logical reads No sort warning The query runs very quickly – slightly faster than Test 3, and without spilling the sort to tempdb (there is no sort warning in the trace, and the monitoring query shows zero tempdb usage by this query).  SQL Server is passing the in-row pointer structure down the plan and only looking up the LOB value on the output side of the sort. The Hidden Problem There is still a huge problem with this query though – it requires a 245MB memory grant.  No wonder the sort doesn’t spill to tempdb now – 245MB is about 20 times more memory than this query actually requires to sort 50,000 records containing LOB data pointers.  Notice that the estimated row and data sizes in the plan are the same as in test 2 (where the MAX data was stored in-row). The optimizer assumes that MAX data is stored in-row, regardless of the sp_tableoption setting ‘large value types out of row’.  Why?  Because this option is dynamic – changing it does not immediately force all MAX data in the table in-row or off-row, only when data is added or actually changed.  SQL Server does not keep statistics to show how much MAX or TEXT data is currently in-row, and how much is stored in LOB pages.  This is an annoying limitation, and one which I hope will be addressed in a future version of the product. So why should we worry about this?  Excessive memory grants reduce concurrency and may result in queries waiting on the RESOURCE_SEMAPHORE wait type while they wait for memory they do not need.  245MB is an awful lot of memory, especially on 32-bit versions where memory grants cannot use AWE-mapped memory.  Even on a 64-bit server with plenty of memory, do you really want a single query to consume 0.25GB of memory unnecessarily?  That’s 32,000 8KB pages that might be put to much better use. The Solution The answer is not to use the TEXT data type for the padding column.  That solution happens to have better performance characteristics for this specific query, but it still results in a spilled sort, and it is hard to recommend the use of a data type which is scheduled for removal.  I hope it is clear to you that the fundamental problem here is that SQL Server sorts the whole set arriving at a Sort operator.  Clearly, it is not efficient to sort the whole table in memory just to return 150 rows in a random order. The TEXT example was more efficient because it dramatically reduced the size of the set that needed to be sorted.  We can do the same thing by selecting 150 unique keys from the table at random (sorting by NEWID() for example) and only then retrieving the large padding column values for just the 150 rows we need.  The following script implements that idea for all four tables: SET STATISTICS IO ON ; WITH TestTable AS ( SELECT * FROM dbo.TestCHAR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id = ANY (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAX ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestTEXT ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAXOOR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; All four queries now return results in much less than a second, with memory grants between 6 and 12MB, and without spilling to tempdb.  The small remaining inefficiency is in reading the id column values from the clustered primary key index.  As a clustered index, it contains all the in-row data at its leaf.  The CHAR and VARCHAR(MAX) tables store the padding column in-row, so id values are separated by a 3999-character column, plus row overhead.  The TEXT and MAXOOR tables store the padding values off-row, so id values in the clustered index leaf are separated by the much-smaller off-row pointer structure.  This difference is reflected in the number of logical page reads performed by the four queries: Table 'TestCHAR' logical reads 25511 lob logical reads 000 Table 'TestMAX'. logical reads 25511 lob logical reads 000 Table 'TestTEXT' logical reads 00412 lob logical reads 597 Table 'TestMAXOOR' logical reads 00413 lob logical reads 446 We can increase the density of the id values by creating a separate nonclustered index on the id column only.  This is the same key as the clustered index, of course, but the nonclustered index will not include the rest of the in-row column data. CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestCHAR (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAX (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestTEXT (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAXOOR (id); The four queries can now use the very dense nonclustered index to quickly scan the id values, sort them by NEWID(), select the 150 ids we want, and then look up the padding data.  The logical reads with the new indexes in place are: Table 'TestCHAR' logical reads 835 lob logical reads 0 Table 'TestMAX' logical reads 835 lob logical reads 0 Table 'TestTEXT' logical reads 686 lob logical reads 597 Table 'TestMAXOOR' logical reads 686 lob logical reads 448 With the new index, all four queries use the same query plan (click to enlarge): Performance Summary: 0.3 seconds elapsed time 6MB memory grant 0MB tempdb usage 1MB sort set 835 logical reads (CHAR, MAX) 686 logical reads (TEXT, MAXOOR) 597 LOB logical reads (TEXT) 448 LOB logical reads (MAXOOR) No sort warning I’ll leave it as an exercise for the reader to work out why trying to eliminate the Key Lookup by adding the padding column to the new nonclustered indexes would be a daft idea Conclusion This post is not about tuning queries that access columns containing big strings.  It isn’t about the internal differences between TEXT and MAX data types either.  It isn’t even about the cool use of UPDATE .WRITE used in the MAXOOR table load.  No, this post is about something else: Many developers might not have tuned our starting example query at all – 5 seconds isn’t that bad, and the original query plan looks reasonable at first glance.  Perhaps the NEWID() function would have been blamed for ‘just being slow’ – who knows.  5 seconds isn’t awful – unless your users expect sub-second responses – but using 250MB of memory and writing 200MB to tempdb certainly is!  If ten sessions ran that query at the same time in production that’s 2.5GB of memory usage and 2GB hitting tempdb.  Of course, not all queries can be rewritten to avoid large memory grants and sort spills using the key-lookup technique in this post, but that’s not the point either. The point of this post is that a basic understanding of execution plans is not enough.  Tuning for logical reads and adding covering indexes is not enough.  If you want to produce high-quality, scalable TSQL that won’t get you paged as soon as it hits production, you need a deep understanding of execution plans, and as much accurate, deep knowledge about SQL Server as you can lay your hands on.  The advanced database developer has a wide range of tools to use in writing queries that perform well in a range of circumstances. By the way, the examples in this post were written for SQL Server 2008.  They will run on 2005 and demonstrate the same principles, but you won’t get the same figures I did because 2005 had a rather nasty bug in the Top N Sort operator.  Fair warning: if you do decide to run the scripts on a 2005 instance (particularly the parallel query) do it before you head out for lunch… This post is dedicated to the people of Christchurch, New Zealand. © 2011 Paul White email: @[email protected] twitter: @SQL_Kiwi

    Read the article

  • .NET 4.5 now supported with Windows Azure Web Sites

    - by ScottGu
    This week we finished rolling out .NET 4.5 to all of our Windows Azure Web Site clusters.  This means that you can now publish and run ASP.NET 4.5 based apps, and use  .NET 4.5 libraries and features (for example: async and the new spatial data-type support in EF), with Windows Azure Web Sites.  This enables a ton of really great capabilities - check out Scott Hanselman’s great post of videos that highlight a few of them. Visual Studio 2012 includes built-in publishing support to Windows Azure, which makes it really easy to publish and deploy .NET 4.5 based sites within Visual Studio (you can deploy both apps + databases).  With the Migrations feature of EF Code First you can also do incremental database schema updates as part of publishing (which enables a really slick automated deployment workflow). Each Windows Azure account is eligible to host 10 free web-sites using our free-tier.  If you don’t already have a Windows Azure account, you can sign-up for a free trial and start using them today. In the next few days we’ll also be releasing support for .NET 4.5 and Windows Server 2012 with Windows Azure Cloud Services (Web and Worker Roles) – together with some great new Azure SDK enhancements.  Keep an eye out on my blog for details about these soon. Hope this helps, Scott P.S. In addition to blogging, I am also now using Twitter for quick updates and to share links. Follow me at: twitter.com/scottgu

    Read the article

  • SNMP: how to get combined output of memBuffer, memCached and memAvailReal?

    - by user492160
    Below is the output of 'free -m' on my system: total used free shared buffers cached Mem: 2026 1936 90 0 212 649 -/+ buffers/cache: 1074 952 Swap: 3359 0 3359 I'd like to retrieve the value 952 of -/+ buffers/cache using 'snmpwalk'. This is for integrating 'free memory' availability with Cacti-poller. But currently the only values available are: # snmpwalk -v 1 -c public localhost .1.3.6.1.4.1.2021.4 UCD-SNMP-MIB::memIndex.0 = INTEGER: 0 UCD-SNMP-MIB::memErrorName.0 = STRING: swap UCD-SNMP-MIB::memTotalSwap.0 = INTEGER: 3440632 UCD-SNMP-MIB::memAvailSwap.0 = INTEGER: 3440576 UCD-SNMP-MIB::memTotalReal.0 = INTEGER: 2075556 UCD-SNMP-MIB::memAvailReal.0 = INTEGER: 92552 UCD-SNMP-MIB::memTotalFree.0 = INTEGER: 3533128 UCD-SNMP-MIB::memMinimumSwap.0 = INTEGER: 16000 UCD-SNMP-MIB::memShared.0 = INTEGER: 0 UCD-SNMP-MIB::memBuffer.0 = INTEGER: 217388 UCD-SNMP-MIB::memCached.0 = INTEGER: 664904 UCD-SNMP-MIB::memSwapError.0 = INTEGER: 0 UCD-SNMP-MIB::memSwapErrorMsg.0 = STRING: Is it possible to retrieve the combined value of memBuffer+memCached+memAvailReal using snmpwalk for graphing with Cacti and RRDTool? If not what options do I possibly have? I'm using net-snmp 5.3.2 on my agent host. Thanks in advance.

    Read the article

  • SharePoint 2010 Hosting :: How to Customize SharePoint 2010 Global Navigation

    - by mbridge
    Requirements - SharePoint Foundation or SharePoint Server 2010 site - SharePoint Designer 2010 Steps 1. The first step in my process was to download from codeplex a starter masterpage http://startermasterpages.codeplex.com/ . 2. Once you downloaded the starter master page, open up your SharePoint site in SharePoint Designer 2010 and on the left in the “Site Objects “ area click on the folder “All Files” and drill down to catalogs >> masterpages . Once you are in the Masterpage folder copy and paste the _starter.master into this folder. 3. The first step in the customization process is to create your custom style sheet. To create your custom style sheet, click on the “all Files” folder and click on “Style Library.” Right click in the style library section and choose Style sheet. Once the style sheet is created, rename it style.css. Now open the style sheet you created in SharePoint Designer. 4. In this next step you will copy and paste the SharePoint core styles for the global navigation into your custom style sheet. Copy and paste the css below into the style sheet and save file .s4-tn{ padding:0px; margin:0px; } .s4-tn ul.static{ white-space:nowrap; } .s4-tn li.static > .menu-item{ /* [ReplaceColor(themeColor:"Dark2")] */ color:#3b4f65; white-space:nowrap; border:1px solid transparent; padding:4px 10px; display:inline-block; height:15px; vertical-align:middle; } .s4-tn ul.dynamic{ /* [ReplaceColor(themeColor:"Light2")] */ background-color:white; /* [ReplaceColor(themeColor:"Dark2-Lighter")] */ border:1px solid #D9D9D9; } .s4-tn li.dynamic > .menu-item{ display:block; padding:3px 10px; white-space:nowrap; font-weight:normal; } .s4-tn li.dynamic > a:hover{ font-weight:normal; /* [ReplaceColor(themeColor:"Light2-Lighter")] */ background-color:#D9D9D9; } .s4-tn li.static > a:hover { /* [ReplaceColor(themeColor:"Accent1")] */ color:#44aff6; text-decoration:underline; } 5. Once you created the style sheet, go back to the masterpage folder and open the _starter.master file and in the Customization category click edit file. 6. Next, when the edit file opens make sure you view it in split view. Now you are going to search for the reference to our custom masterpage in the code. Make sure you are scrolled to the top in the code section and press “ctrl f” on the key board. This will pop up the find and replace tool. In the” find what field”, copy and paste and then click find next. 7. Now, in the code replace You have now referenced your custom style sheet in your masterpage. 8. The next step is to locate your Global Navigation control, make sure you are scrolled to the top in the code section and press “ctrl f” on the key board. This will pop up the find and replace tool. In the” find what field”, copy and paste ID="TopNavigationMenuV4” and then click find next. Once you find ID="TopNavigationMenuV4” , you should see the following block of code which is the global navigation control: ID="TopNavigationMenuV4" Runat="server" EnableViewState="false" DataSourceID="topSiteMap" AccessKey="" UseSimpleRendering="true" UseSeparateCss="false" Orientation="Horizontal" StaticDisplayLevels="1" MaximumDynamicDisplayLevels="1" SkipLinkText="" CssClass="s4-tn" 9. In the global navigation code above you should see CssClass="s4-tn" . As an additional step you can replace "s4-tn" your own custom name like CssClass="MyNav" . If you can the name of the CSS class make sure you update your custom style sheet with the new name, example below: .MyNav{ padding:0px; margin:0px; } .MyNav ul.static{ white-space:nowrap; } 10. At this point you are ready to brand your global navigation. The next step is to modify your style.css with your customizations to the default SharePoint styles. Have fun styling and make sure you save your work often. Hope it helps!!

    Read the article

  • EM12c Release 4: Cloud Control to Major Tom...

    - by abulloch
    With the latest release of Enterprise Manager 12c, Release 4 (12.1.0.4) the EM development team has added new functionality to assist the EM Administrator to monitor the health of the EM infrastructure.   Taking feedback delivered from customers directly and through customer advisory boards some nice enhancements have been made to the “Manage Cloud Control” sections of the UI, commonly known in the EM community as “the MTM pages” (MTM stands for Monitor the Monitor).  This part of the EM Cloud Control UI is viewed by many as the mission control for EM Administrators. In this post we’ll highlight some of the new information that’s on display in these redesigned pages and explain how the information they present can help EM administrators identify potential bottlenecks or issues with the EM infrastructure. The first page we’ll take a look at is the newly designed Repository information page.  You can get to this from the main Setup menu, through Manage Cloud Control, then Repository.  Once this page loads you’ll see the new layout that includes 3 tabs containing more drill-down information. The Repository Tab The first tab, Repository, gives you a series of 6 panels or regions on screen that display key information that the EM Administrator needs to review from time to time to ensure that their infrastructure is in good health. Rather than go through every panel let’s call out a few and let you explore the others later yourself on your own EM site.  Firstly, we have the Repository Details panel. At a glance the EM Administrator can see the current version of the EM repository database and more critically, three important elements of information relating to availability and reliability :- Is the database in Archive Log mode ? Is the database using Flashback ? When was the last database backup taken ? In this test environment above the answers are not too worrying, however, Production environments should have at least Archivelog mode enabled, Flashback is a nice feature to enable prior to upgrades (for fast rollback) and all Production sites should have a backup.  In this case the backup information in the Control file indicates there’s been no recorded backups taken. The next region of interest to note on this page shows key information around the Repository configuration, specifically, the initialisation parameters (from the spfile). If you’re storing your EM Repository in a Cluster Database you can view the parameters on each individual instance using the Instance Name drop-down selector in the top right of the region. Additionally, you’ll note there is now a check performed on the active configuration to ensure that you’re using, at the very least, Oracle minimum recommended values.  Should the values in your EM Repository not meet these requirements it will be flagged in this table with a red X for non-compliance.  You can of-course change these values within EM by selecting the Database target and modifying the parameters in the spfile (and optionally, the run-time values if the parameter allows dynamic changes). The last region to call out on this page before moving on is the new look Repository Scheduler Job Status region. This region is an update of a similar region seen on previous releases of the MTM pages in Cloud Control but there’s some important new functionality that’s been added that customers have requested. First-up - Restarting Repository Jobs.  As you can see from the graphic, you can now optionally select a job (by selecting the row in the UI table element) and click on the Restart Job button to take care of any jobs which have stopped or stalled for any reason.  Previously this needed to be done at the command line using EMDIAG or through a PL/SQL package invocation.  You can now take care of this directly from within the UI. Next, you’ll see that a feature has been added to allow the EM administrator to customise the run-time for some of the background jobs that run in the Repository.  We heard from some customers that ensuring these jobs don’t clash with Production backups, etc is a key requirement.  This new functionality allows you to select the pencil icon to edit the schedule time for these more resource intensive background jobs and modify the schedule to avoid clashes like this. Moving onto the next tab, let’s select the Metrics tab. The Metrics Tab There’s some big changes here, this page contains new information regions that help the Administrator understand the direct impact the in-bound metric flows are having on the EM Repository.  Many customers have provided feedback that they are in the dark about the impact of adding new targets or large numbers of new hosts or new target types into EM and the impact this has on the Repository.  This page helps the EM Administrator get to grips with this.  Let’s take a quick look at two regions on this page. First-up there’s a bubble chart showing a comprehensive view of the top resource consumers of metric data, over the last 30 days, charted as the number of rows loaded against the number of collections for the metric.  The size of the bubble indicates a relative volume.  You can see from this example above that a quick glance shows that Host metrics are the largest inbound flow into the repository when measured by number of rows.  Closely following behind this though are a large number of collections for Oracle Weblogic Server and Application Deployment.  Taken together the Host Collections is around 0.7Mb of data.  The total information collection for Weblogic Server and Application Deployments is 0.38Mb and 0.37Mb respectively. If you want to get this information breakdown on the volume of data collected simply hover over the bubble in the chart and you’ll get a floating tooltip showing the information. Clicking on any bubble in the chart takes you one level deeper into a drill-down of the Metric collection. Doing this reveals the individual metric elements for these target types and again shows a representation of the relative cost - in terms of Number of Rows, Number of Collections and Storage cost of data for each Metric type. Looking at another panel on this page we can see a different view on this data. This view shows a view of the Top N metrics (the drop down allows you to select 10, 15 or 20) and sort them by volume of data.  In the case above we can see the largest metric collection (by volume) in this case (over the last 30 days) is the information about OS Registered Software on a Host target. Taken together, these two regions provide a powerful tool for the EM Administrator to understand the potential impact of any new targets that have been discovered and promoted into management by EM12c.  It’s a great tool for identifying the cause of a sudden increase in Repository storage consumption or Redo log and Archive log generation. Using the information on this page EM Administrators can take action to mitigate any load impact by deploying monitoring templates to the targets causing most load if appropriate.   The last tab we’ll look at on this page is the Schema tab. The Schema Tab Selecting this tab brings up a window onto the SYSMAN schema with a focus on Space usage in the EM Repository.  Understanding what tablespaces are growing, at what rate, is essential information for the EM Administrator to stay on top of managing space allocations for the EM Repository so that it works as efficiently as possible and performs well for the users.  Not least because ensuring storage is managed well ensures continued availability of EM for monitoring purposes. The first region to highlight here shows the trend of space usage for the tablespaces in the EM Repository over time.  You can see the upward trend here showing that storage in the EM Repository is being consumed on an upward trend over the last few days here. This is normal as this EM being used here is brand new with Agents being added daily to bring targets into monitoring.  If your Enterprise Manager configuration has reached a steady state over a period of time where the number of new inbound targets is relatively small, the metric collection settings are fairly uniform and standardised (using Templates and Template Collections) you’re likely to see a trend of space allocation that plateau’s. The table below the trend chart shows the Top 20 Tables/Indexes sorted descending by order of space consumed.  You can switch the trend view chart and corresponding detail table by choosing a different tablespace in the EM Repository using the drop-down picker on the top right of this region. The last region to highlight on this page is the region showing information about the Purge policies in effect in the EM Repository. This information is useful to illustrate to EM Administrators the default purge policies in effect for the different categories of information available in the EM Repository.  Of course, it’s also been a long requested feature to have the ability to modify these default retention periods.  You can also do this using this screen.  As there are interdependencies between some data elements you can’t modify retention policies on a feature by feature basis.  Instead, retention policies take categories of information and bundles them together in Groups.  Retention policies are modified at the Group Level.  Understanding the impact of this really deserves a blog post all on it’s own as modifying these can have a significant impact on both the EM Repository’s storage footprint and it’s performance.  For now, we’re just highlighting the features visibility on these new pages. As a user of EM12c we hope the new features you see here address some of the feedback that’s been given on these pages over the past few releases.  We’ll look out for any comments or feedback you have on these pages ! 

    Read the article

  • How to Switch Mac OS X to Use OpenDNS or Google DNS

    - by The Geek
    Are you still using your service provider’s DNS servers? If you’re on Comcast, you probably noticed their DNS servers completely died recently, taking down the internet—but anybody using the more reliable OpenDNS or Google DNS had no problems. Here’s how to set it up on your Mac OS X computer. There’s lots of other reasons to use OpenDNS or Google DNS other than just their rock-solid reliability—they are often much faster than your ISP’s DNS server, and in the case of OpenDNS, there’s loads of extra features like content filtering, typo correction, anti-phishing, and child protection controls. If you’re using Windows, be sure and check out some of our other articles on the subject: Speed Up Your Web Browsing with Google Public DNS Easily Add OpenDNS To Your Router Protect Your Kids Online Using Open DNS Otherwise, keep reading for how to set it up on your Mac. Latest Features How-To Geek ETC The Complete List of iPad Tips, Tricks, and Tutorials The 50 Best Registry Hacks that Make Windows Better The How-To Geek Holiday Gift Guide (Geeky Stuff We Like) LCD? LED? Plasma? The How-To Geek Guide to HDTV Technology The How-To Geek Guide to Learning Photoshop, Part 8: Filters Improve Digital Photography by Calibrating Your Monitor Exploring the Jungle Ruins Wallpaper Protect Your Privacy When Browsing with Chrome and Iron Browser Free Shipping Day is Friday, December 17, 2010 – National Free Shipping Day Find an Applicable Quote for Any Programming Situation Winter Theme for Windows 7 from Microsoft Score Free In-Flight Wi-Fi Courtesy of Google Chrome

    Read the article

  • Community Events and Workshops in November 2012 #ssas #tabular #powerpivot

    - by Marco Russo (SQLBI)
    I and Alberto have a busy agenda until the end of the month, but if you are based in Northern Europe there are many chance to meet one of us in the next couple of weeks! Belgium, 20 November 2012 – SQL Server Days 2012 with Marco Russo I will present two sessions in this conference, “Data Modeling for Tabular” and “Querying and Optimizing DAX” Copenhagen, 21-22 November, 2012 – SSAS Tabular Workshop with Alberto Ferrari Alberto will be the speaker for 2 days – you can still register if you want a full immersion! Copenhagen, 21 November 2012 – Free Community Event with Alberto Ferrari (hosted in Microsoft Hellerup) In the evening Alberto will present “Excel 2013 PowerPivot in Action” Munich, 27-28 November 2012 - SSAS Tabular Workshop with Alberto Ferrari The SSAS workshop will run also in Germany, this time in Munich. Also here there is still some seat still available. Munich, 27 November 2012 - Free Community Event with Alberto Ferrari (hosted in Microsoft ) In the evening Alberto will present “Excel 2013 PowerPivot in Action” Moscow, 27-28 November 2012 – TechEd Russia 2012 with Marco Russo I will speak during the keynote on November 27 and I will present two session the day after, “Developing an Analysis Services Tabular Project BI Semantic Model” and “Excel 2013 PowerPivot in Action” Stockholm, 29-30 November 2012 - SSAS Tabular Workshop with Marco Russo I will run this workshop in Stockholm – if you want to register here, hurry up! Few seats still available! Stockholm, 29 November 2012 - Free Community Event (sold-out!) with Marco Russo In the evening I will present “Excel 2013 PowerPivot in Action” If you want to attend a SSAS Tabular Workshop online, you can also register to the Online edition of December 5-6, 2012, which is still in early bird and is scheduled with a friendly time zone for America’s countries (which could be good for Europe too, in case you don’t mind attending a workshop until midnight!).

    Read the article

  • Load balanced asp.net websites and required memory usage

    - by Matt
    Each of my servers has 8Gb RAM and the memory usage hovers around 7Gb. I have a load balancer available to me but at the moment I'm worried that putting my sites through it will cause the platform to fall over. The load balancer would be configured with a sticky round-robin where a new connection is round robin but subsequent connections for the same source ip will remain on the same server (until a limit is reached). Thats all standard stuff. How do I know what memory usage my sites will need across the platform when I put them through the load balancer? Rather than knowing that a site is using 150mb on a particular server I could face a situation where the 150mb is taken up on each of the servers. I know that with only 1 gb free I could have a serious problem on my hands. If I free up some memory then how can I work out what I need to have free to prevent this from happening? Thanks Matt

    Read the article

  • Changing the default installation path to a newly installed hard disk

    - by mgj
    Hi, I am currently working on a dual-booted PC. I am using Windows XP and Ubuntu 10.04 Lucid Lynx released in April 2010. The allocated partition to Ubuntu that I am making use of has almost exhausted. Current memory allocations on the PC wrt Ubuntu OS looks like this: bodhgaya@pc146724-desktop:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda2 8.6G 8.0G 113M 99% / none 998M 268K 998M 1% /dev none 1002M 580K 1002M 1% /dev/shm none 1002M 100K 1002M 1% /var/run none 1002M 0 1002M 0% /var/lock none 1002M 0 1002M 0% /lib/init/rw /dev/sda1 25G 16G 9.8G 62% /media/C /dev/sdb1 37G 214M 35G 1% /media/ubuntulinuxstore bodhgaya@pc146724-desktop:~$ cd /tmp I am trying to mount a 40GB(/dev/sdb1 - given below) new hard disk along with my existing Ubuntu system to overcome with hard disk space related issues. I referred to the following tutorial to mount a new hard disk onto the system:- http://www.smorgasbord.net/how-to-in...untu-linux%20/ I was able to successfully mount this hard disk for Ubuntu 0S. I have this new hard disk setup in /media/ubuntulinuxstore directory. The current partition in my system looks like this: bodhgaya@pc146724-desktop:/media/ubuntulinuxstore$ sudo fdisk -l [sudo] password for bodhgaya: Disk /dev/sda: 40.0 GB, 40000000000 bytes 255 heads, 63 sectors/track, 4863 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x446eceb5 Device Boot Start End Blocks Id System /dev/sda1 * 2 3264 26210047+ 7 HPFS/NTFS /dev/sda2 3265 4385 9004432+ 83 Linux /dev/sda3 4386 4863 3839535 82 Linux swap / Solaris Disk /dev/sdb: 40.0 GB, 40000000000 bytes 255 heads, 63 sectors/track, 4863 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xfa8afa8a Device Boot Start End Blocks Id System /dev/sdb1 1 4862 39053983+ 7 HPFS/NTFS bodhgaya@pc146724-desktop:/media/ubuntulinuxstore$ Now, I have a concern wrt the "location" where the new softwares will be installed. Generally softwares are installed via the terminal and by default a fixed path is used to where the post installation set up files can be found (I am talking in context of the drive). This is like the typical case of Windows, where softwares by default are installed in the C: drive. These days people customize their installations to a drive which they find apt to serve their purpose (generally based on availability of hard disk space). I am trying to figure out how to customize the same for Ubuntu. As we all know the most softwares are installed via commands given from the Terminal. My road block is how do I redirect the default path set on the terminal where files get installed to this new hard disk. This if done will help me overcome space constraints I am currently facing wrt the partition on which my Ubuntu is initially installed. I would also by this, save time on not formatting my system and reinstalling Ubuntu and other softwares all over again. Please help me with this, your suggestions are much appreciated.

    Read the article

  • CentOS only detecting 50% of ram

    - by Devator
    I have 16GB ram in my machine. Before, free -m outputted the normal 16 GB ram, however now (after a reboot) it only detects 8 GB ram. Is one ram module damaged? grep -i memory /var/log/dmesg outputs Memory: 15621184k/16017200k available (2535k kernel code, 387120k reserved, 1748k data, 196k init). (Which looks like 16 GB to me). Free -m outputs: total used free shared buffers cached Mem: 7484 7415 68 0 6104 524 -/+ buffers/cache: 786 6697 Swap: 2055 0 2054 Anything I might be missing? Thanks in advance.

    Read the article

  • MSCC: Purpose and benefits of Version Control Systems (VCS)

    Unfortunately, there was no monthly meetup during May. Which means that it was even more important and interesting to go forward with a great topic for this month. Earlier this year I already spoke to Nayar Joolfoo about doing a presentation on version control systems (VCS), and he gladly agreed since then. It was just about finding the right date for the action. Furthermore, it was also a great coincidence that Avinash Meetoo announced on social media networks that Knowledge 7 is about to have a new training on "Effective git" - which correlates to a book title Avinash is currently working on - all the best with your approach on this and reach out to our MSCC craftsmen for recessions. Once again a big Thank you to Orange Ebene Accelerator on providing the venue for us, and the MSCC members involved on securing the time slot for our event. Unfortunately, it's kind of tough to get an early confirmation for our meetups these days. I'll keep you posted on that one as there are some interesting and exciting options coming up soon. Okay, let's talk about the meeting and version control systems again. As usual, I'm going to put my first impression of the meetup: "Absolutely great topic, questions and discussions on version control systems, like git or VSO. I was also highly pleased by the number of first timers and female IT geeks. Hopefully, we will be able to keep this trend for future get-togethers." And I really have to emphasise the amount of fresh blood coming to our gathering. Also, during the initial phase it was surprising to see that exactly those first-timers, most of them students at various campuses here on the island, had absolutely no idea about version control systems. More about further down... Reactions of other attendees If I counted correctly, we had a total of 17 attendees this month, and I'd like to give you feedback from some of them: "Inspiring. Helped me understand more about GIT." -- Sean on event comments "Joined the meetup today with literally no idea what is a version control system. I have several reasons why I should be starting to use VCS as from NOW in my projects. Thanks Nayar, Jochen and other participants :)" -- Yudish on event comments "Was present today and I'm very satisfied.I was not aware if there was a such tool like git available. Thanks to those who contributed for this meetup.It was great. Learned a lot from this meetup!!" -- Leonardo on event comments "Seriously, I can see how it’s going to ease my task and help me save time. Gone are the issues with files backups.  And since I’ll be doing my dissertation this year, using Git would help me a lot for my backups and I’m grateful to Nayar for the great explanation." -- Swan-Iyah on MSCC meetup : Version Controls Hopefully, I'll be able to get some other sources - personal blogs preferred - on our meeting. Geeks, thank you so much for those encouraging comments. It's really great to experience that we, all members of the MSCC, are doing the right thing to get more IT information out, and to help each other to improve and evolve in our professional careers. Our agenda of the day Honestly, we had a bumpy start... First, I was battling a little bit with the movable room divider in order to maximize the space. I mean, we had 24 RSVPs and usually there might additional people coming along. Then, for what ever reason, we were facing power outages - actually twice in short periods. Not too good for the projector after all, but hey it went smooth for the rest of the time being. And last but not least... our first speaker Nayar got stuck somewhere on the road. ;-) Anyway, not a real show-stopper and we used the time until Nayar's arrival to introduce ourselves a little bit. It is always important for me to get to know the "newbies" a little bit, and as a result we had lots of students of university - first year, second year and recent graduates - among them. Surprisingly, none of them was ever in contact with version control systems at all. I mean, this is a shocking discovery! Similar to the ability of touch-typing I'd say that being able to use (and master) any kind of version control system is compulsory in any job in the IT industry. Seriously, I'm wondering what is being taught during the classes on the campus. All of them have to work on semester assessments or final projects, even in small teams of 2-4 people. That's the perfect occasion to get started with VCS. Already in this phase, we had great input from more experienced VCS users, like Sean, Avinash and myself. git - a modern approach to VCS - Nayar What a tour! Nayar gave us the full round of git from start to finish, even touching some more advanced techniques. First, he started to explain about the importance of version control systems as an essential tool for software developers, even working alone on a project, and the ability to have a kind of "time machine" that allows you to inspect and revert to a previous version of source code at any time. Then he showed how easy it is to install git on an Ubuntu based system but also mentioned that git is literally available for any operating system, like Windows, Mac OS X and of course other Linux distributions. Next, he showed us how to set the initial configuration values of user name and email address which simplifies the daily usage of the git client while working with your repositories. Then he initialised and added a new repository for some local development of a blogging software. All commands were done using the command line interface (CLI) so that they can be repeated on any system as reference. The syntax and the procedure is always the same, and Nayar clearly mentioned this to the attendees. Now, having a git repository in place it was about time to work on some "important" changes on the blogging software - just for the sake of demonstrating the ease of use and power of git. One interesting question came very early: "How many commands do we have to learn? It looks quite difficult at the moment" - Well, rest assured that during daily development circles you will need less than 10 git commands on a regular base: git add, commit, push, pull, checkout, and merge And Nayar demo'd all of them. Much to the delight of everyone he also showed gitk which is the git repository browser. It's an UI tool to display changes in a repository or a selected set of commits. This includes visualizing the commit graph, showing information related to each commit, and the files in the trees of each revision. Using gitk to display and browse information of a local git repository And last but not least, we took advantage of the internet connectivity and reached out to various online portals offering git hosting for free. Nayar showed us how to push the local repository into a remote system on github. Showing the web-based git browser and history handling, and then also explained and demo'd on how to connect to existing online repositories in order to get access to either your own source code or other people's open source projects. Next to github, we also spoke about bitbucket and gitlab as potential online platforms for your projects. Have a look at the conditions and details about their free service packages and what you can get additionally as a paying customer. Usually, you already get a lot of services for up to five users for free but there might be other important aspects that might have an impact on your decision. Anyways, moving git-based repositories between systems is a piece of cake, and changing online platforms is possible at any stage of your development. Visual Studio Online (VSO) - Jochen Well, Nayar literally covered all elements of working with git during his session, including the use of external online platforms. So, what would be the advantage of talking about Visual Studio Online (VSO)? First of all, VSO is "just another" online platform for hosting and managing git repositories on remote systems, equivalent to github, bitbucket, or any other web site. At the moment (of writing), Microsoft also provides a free package of up to five users / developers on a git repository but there is more in that package. Of course, it is related to software development on the Windows systems and the bonds are tightened towards the use of Visual Studio but out of experience you are absolutely not restricted to that. Connecting a Linux or Mac OS X machine with a git client or an integrated development environment (IDE) like Eclipse or Xcode works as smooth as expected. So, why should one opt in for VSO? Well, one of the main aspects that I would like to mention here is that VSO integrates the Application Life Cycle Methodology (ALM) of Microsoft in their platform. Meaning that you get agile project management with Backlogs, Sprints, Burn-down charts as well as the ability to track tasks, bug reports and work items next to collaborative team chats. It's the whole package of agile development you'll get. And, something I mentioned briefly during the begin of our meeting, VSO gives you the possibility of an automated continuous integrated (CI) process which builds and can run tests of your source code after each commit of changes. Having a proper CI strategy is also part of the Clean Code Developer practices - on Level Green actually -, and not only simplifies your life as a software developer but also reduces the sources of potential errors. Seamless integration and automated deployment between Microsoft Azure Web Sites and git repository But my favourite feature is the seamless continuous deployment to Microsoft Azure. Especially, while working on web projects it's absolutely astounishing that as soon as you commit your chances it just takes a couple of seconds until your modifications are deployed and available on your Azure-hosted web sites. Upcoming Events and networking Due to the adjusted times, everybody was kind of hungry and we didn't follow up on networking or upcoming events - very unfortunate to my opinion and this will have an impact on future planning of our meetups. Because I rather would like to see more conversations during and at the end of our meetings than everyone just packing their laptops, bags and accessories and rush off to grab some food. I was hoping to get some information regarding this year's Code Challenge - supposedly to be organised during July? Maybe someone could leave a comment on that - but I couldn't get any updates. Well, I'll keep digging... In case that you would like to get more into git and how to use it effectively, please check out Knowledge 7's upcoming course on "Effective git". Thanks Avinash for your vital input into today's conversation and I'm looking forward to get a grip on your book title very soon. My resume of the day Do not work in IT without any kind of version control system! Seriously, without a VCS in place you're doing it wrong. It's like driving a car without seat belts attached or riding your bike without safety helmet. You don't do that! End of discussion. ;-) Nowadays, having access to free (as in cost) tools to install on your machine and numerous online platforms to host your source code for free for up to five users it's a no-brainer to get yourself familiar with VCS. Today's sessions gave a good overview on how to start using git and how to connect to various remote services like github or VSO.

    Read the article

  • iPad and User Assistance

    - by ultan o'broin
    What possibilities does the iPad over for user assistance in the enterprise space? We will research the possibilities but I can see a number of possibilities already for remote workers who need access to trouble-shooting information on-site, implementers who need reference information and diagrams, business analysts or technical users accessing reports and dashboards for metrics or issues, functional users who need org charts and other data visualizations, and so on. It could also open up more possibilities for collaborative problem solving. User assistance content can take advantage of the device's superb display, graphics capability, connectivity, and long battery life. The possibility of opening up more innovative user assistance solutions (such as comics) is an exciting one for everyone in the UX space. Aligned to this possibility we need to research how users would use the device as they work.

    Read the article

  • Understanding G1 GC Logs

    - by poonam
    The purpose of this post is to explain the meaning of GC logs generated with some tracing and diagnostic options for G1 GC. We will take a look at the output generated with PrintGCDetails which is a product flag and provides the most detailed level of information. Along with that, we will also look at the output of two diagnostic flags that get enabled with -XX:+UnlockDiagnosticVMOptions option - G1PrintRegionLivenessInfo that prints the occupancy and the amount of space used by live objects in each region at the end of the marking cycle and G1PrintHeapRegions that provides detailed information on the heap regions being allocated and reclaimed. We will be looking at the logs generated with JDK 1.7.0_04 using these options. Option -XX:+PrintGCDetails Here's a sample log of G1 collection generated with PrintGCDetails. 0.522: [GC pause (young), 0.15877971 secs] [Parallel Time: 157.1 ms] [GC Worker Start (ms): 522.1 522.2 522.2 522.2 Avg: 522.2, Min: 522.1, Max: 522.2, Diff: 0.1] [Ext Root Scanning (ms): 1.6 1.5 1.6 1.9 Avg: 1.7, Min: 1.5, Max: 1.9, Diff: 0.4] [Update RS (ms): 38.7 38.8 50.6 37.3 Avg: 41.3, Min: 37.3, Max: 50.6, Diff: 13.3] [Processed Buffers : 2 2 3 2 Sum: 9, Avg: 2, Min: 2, Max: 3, Diff: 1] [Scan RS (ms): 9.9 9.7 0.0 9.7 Avg: 7.3, Min: 0.0, Max: 9.9, Diff: 9.9] [Object Copy (ms): 106.7 106.8 104.6 107.9 Avg: 106.5, Min: 104.6, Max: 107.9, Diff: 3.3] [Termination (ms): 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] [Termination Attempts : 1 4 4 6 Sum: 15, Avg: 3, Min: 1, Max: 6, Diff: 5] [GC Worker End (ms): 679.1 679.1 679.1 679.1 Avg: 679.1, Min: 679.1, Max: 679.1, Diff: 0.1] [GC Worker (ms): 156.9 157.0 156.9 156.9 Avg: 156.9, Min: 156.9, Max: 157.0, Diff: 0.1] [GC Worker Other (ms): 0.3 0.3 0.3 0.3 Avg: 0.3, Min: 0.3, Max: 0.3, Diff: 0.0] [Clear CT: 0.1 ms] [Other: 1.5 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.3 ms] [Ref Enq: 0.0 ms] [Free CSet: 0.3 ms] [Eden: 12M(12M)->0B(10M) Survivors: 0B->2048K Heap: 13M(64M)->9739K(64M)] [Times: user=0.59 sys=0.02, real=0.16 secs] This is the typical log of an Evacuation Pause (G1 collection) in which live objects are copied from one set of regions (young OR young+old) to another set. It is a stop-the-world activity and all the application threads are stopped at a safepoint during this time. This pause is made up of several sub-tasks indicated by the indentation in the log entries. Here's is the top most line that gets printed for the Evacuation Pause. 0.522: [GC pause (young), 0.15877971 secs] This is the highest level information telling us that it is an Evacuation Pause that started at 0.522 secs from the start of the process, in which all the regions being evacuated are Young i.e. Eden and Survivor regions. This collection took 0.15877971 secs to finish. Evacuation Pauses can be mixed as well. In which case the set of regions selected include all of the young regions as well as some old regions. 1.730: [GC pause (mixed), 0.32714353 secs] Let's take a look at all the sub-tasks performed in this Evacuation Pause. [Parallel Time: 157.1 ms] Parallel Time is the total elapsed time spent by all the parallel GC worker threads. The following lines correspond to the parallel tasks performed by these worker threads in this total parallel time, which in this case is 157.1 ms. [GC Worker Start (ms): 522.1 522.2 522.2 522.2Avg: 522.2, Min: 522.1, Max: 522.2, Diff: 0.1] The first line tells us the start time of each of the worker thread in milliseconds. The start times are ordered with respect to the worker thread ids – thread 0 started at 522.1ms and thread 1 started at 522.2ms from the start of the process. The second line tells the Avg, Min, Max and Diff of the start times of all of the worker threads. [Ext Root Scanning (ms): 1.6 1.5 1.6 1.9 Avg: 1.7, Min: 1.5, Max: 1.9, Diff: 0.4] This gives us the time spent by each worker thread scanning the roots (globals, registers, thread stacks and VM data structures). Here, thread 0 took 1.6ms to perform the root scanning task and thread 1 took 1.5 ms. The second line clearly shows the Avg, Min, Max and Diff of the times spent by all the worker threads. [Update RS (ms): 38.7 38.8 50.6 37.3 Avg: 41.3, Min: 37.3, Max: 50.6, Diff: 13.3] Update RS gives us the time each thread spent in updating the Remembered Sets. Remembered Sets are the data structures that keep track of the references that point into a heap region. Mutator threads keep changing the object graph and thus the references that point into a particular region. We keep track of these changes in buffers called Update Buffers. The Update RS sub-task processes the update buffers that were not able to be processed concurrently, and updates the corresponding remembered sets of all regions. [Processed Buffers : 2 2 3 2Sum: 9, Avg: 2, Min: 2, Max: 3, Diff: 1] This tells us the number of Update Buffers (mentioned above) processed by each worker thread. [Scan RS (ms): 9.9 9.7 0.0 9.7 Avg: 7.3, Min: 0.0, Max: 9.9, Diff: 9.9] These are the times each worker thread had spent in scanning the Remembered Sets. Remembered Set of a region contains cards that correspond to the references pointing into that region. This phase scans those cards looking for the references pointing into all the regions of the collection set. [Object Copy (ms): 106.7 106.8 104.6 107.9 Avg: 106.5, Min: 104.6, Max: 107.9, Diff: 3.3] These are the times spent by each worker thread copying live objects from the regions in the Collection Set to the other regions. [Termination (ms): 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] Termination time is the time spent by the worker thread offering to terminate. But before terminating, it checks the work queues of other threads and if there are still object references in other work queues, it tries to steal object references, and if it succeeds in stealing a reference, it processes that and offers to terminate again. [Termination Attempts : 1 4 4 6 Sum: 15, Avg: 3, Min: 1, Max: 6, Diff: 5] This gives the number of times each thread has offered to terminate. [GC Worker End (ms): 679.1 679.1 679.1 679.1 Avg: 679.1, Min: 679.1, Max: 679.1, Diff: 0.1] These are the times in milliseconds at which each worker thread stopped. [GC Worker (ms): 156.9 157.0 156.9 156.9 Avg: 156.9, Min: 156.9, Max: 157.0, Diff: 0.1] These are the total lifetimes of each worker thread. [GC Worker Other (ms): 0.3 0.3 0.3 0.3Avg: 0.3, Min: 0.3, Max: 0.3, Diff: 0.0] These are the times that each worker thread spent in performing some other tasks that we have not accounted above for the total Parallel Time. [Clear CT: 0.1 ms] This is the time spent in clearing the Card Table. This task is performed in serial mode. [Other: 1.5 ms] Time spent in the some other tasks listed below. The following sub-tasks (which individually may be parallelized) are performed serially. [Choose CSet: 0.0 ms] Time spent in selecting the regions for the Collection Set. [Ref Proc: 0.3 ms] Total time spent in processing Reference objects. [Ref Enq: 0.0 ms] Time spent in enqueuing references to the ReferenceQueues. [Free CSet: 0.3 ms] Time spent in freeing the collection set data structure. [Eden: 12M(12M)->0B(13M) Survivors: 0B->2048K Heap: 14M(64M)->9739K(64M)] This line gives the details on the heap size changes with the Evacuation Pause. This shows that Eden had the occupancy of 12M and its capacity was also 12M before the collection. After the collection, its occupancy got reduced to 0 since everything is evacuated/promoted from Eden during a collection, and its target size grew to 13M. The new Eden capacity of 13M is not reserved at this point. This value is the target size of the Eden. Regions are added to Eden as the demand is made and when the added regions reach to the target size, we start the next collection. Similarly, Survivors had the occupancy of 0 bytes and it grew to 2048K after the collection. The total heap occupancy and capacity was 14M and 64M receptively before the collection and it became 9739K and 64M after the collection. Apart from the evacuation pauses, G1 also performs concurrent-marking to build the live data information of regions. 1.416: [GC pause (young) (initial-mark), 0.62417980 secs] ….... 2.042: [GC concurrent-root-region-scan-start] 2.067: [GC concurrent-root-region-scan-end, 0.0251507] 2.068: [GC concurrent-mark-start] 3.198: [GC concurrent-mark-reset-for-overflow] 4.053: [GC concurrent-mark-end, 1.9849672 sec] 4.055: [GC remark 4.055: [GC ref-proc, 0.0000254 secs], 0.0030184 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 4.088: [GC cleanup 117M->106M(138M), 0.0015198 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 4.090: [GC concurrent-cleanup-start] 4.091: [GC concurrent-cleanup-end, 0.0002721] The first phase of a marking cycle is Initial Marking where all the objects directly reachable from the roots are marked and this phase is piggy-backed on a fully young Evacuation Pause. 2.042: [GC concurrent-root-region-scan-start] This marks the start of a concurrent phase that scans the set of root-regions which are directly reachable from the survivors of the initial marking phase. 2.067: [GC concurrent-root-region-scan-end, 0.0251507] End of the concurrent root region scan phase and it lasted for 0.0251507 seconds. 2.068: [GC concurrent-mark-start] Start of the concurrent marking at 2.068 secs from the start of the process. 3.198: [GC concurrent-mark-reset-for-overflow] This indicates that the global marking stack had became full and there was an overflow of the stack. Concurrent marking detected this overflow and had to reset the data structures to start the marking again. 4.053: [GC concurrent-mark-end, 1.9849672 sec] End of the concurrent marking phase and it lasted for 1.9849672 seconds. 4.055: [GC remark 4.055: [GC ref-proc, 0.0000254 secs], 0.0030184 secs] This corresponds to the remark phase which is a stop-the-world phase. It completes the left over marking work (SATB buffers processing) from the previous phase. In this case, this phase took 0.0030184 secs and out of which 0.0000254 secs were spent on Reference processing. 4.088: [GC cleanup 117M->106M(138M), 0.0015198 secs] Cleanup phase which is again a stop-the-world phase. It goes through the marking information of all the regions, computes the live data information of each region, resets the marking data structures and sorts the regions according to their gc-efficiency. In this example, the total heap size is 138M and after the live data counting it was found that the total live data size dropped down from 117M to 106M. 4.090: [GC concurrent-cleanup-start] This concurrent cleanup phase frees up the regions that were found to be empty (didn't contain any live data) during the previous stop-the-world phase. 4.091: [GC concurrent-cleanup-end, 0.0002721] Concurrent cleanup phase took 0.0002721 secs to free up the empty regions. Option -XX:G1PrintRegionLivenessInfo Now, let's look at the output generated with the flag G1PrintRegionLivenessInfo. This is a diagnostic option and gets enabled with -XX:+UnlockDiagnosticVMOptions. G1PrintRegionLivenessInfo prints the live data information of each region during the Cleanup phase of the concurrent-marking cycle. 26.896: [GC cleanup ### PHASE Post-Marking @ 26.896### HEAP committed: 0x02e00000-0x0fe00000 reserved: 0x02e00000-0x12e00000 region-size: 1048576 Cleanup phase of the concurrent-marking cycle started at 26.896 secs from the start of the process and this live data information is being printed after the marking phase. Committed G1 heap ranges from 0x02e00000 to 0x0fe00000 and the total G1 heap reserved by JVM is from 0x02e00000 to 0x12e00000. Each region in the G1 heap is of size 1048576 bytes. ### type address-range used prev-live next-live gc-eff### (bytes) (bytes) (bytes) (bytes/ms) This is the header of the output that tells us about the type of the region, address-range of the region, used space in the region, live bytes in the region with respect to the previous marking cycle, live bytes in the region with respect to the current marking cycle and the GC efficiency of that region. ### FREE 0x02e00000-0x02f00000 0 0 0 0.0 This is a Free region. ### OLD 0x02f00000-0x03000000 1048576 1038592 1038592 0.0 Old region with address-range from 0x02f00000 to 0x03000000. Total used space in the region is 1048576 bytes, live bytes as per the previous marking cycle are 1038592 and live bytes with respect to the current marking cycle are also 1038592. The GC efficiency has been computed as 0. ### EDEN 0x03400000-0x03500000 20992 20992 20992 0.0 This is an Eden region. ### HUMS 0x0ae00000-0x0af00000 1048576 1048576 1048576 0.0### HUMC 0x0af00000-0x0b000000 1048576 1048576 1048576 0.0### HUMC 0x0b000000-0x0b100000 1048576 1048576 1048576 0.0### HUMC 0x0b100000-0x0b200000 1048576 1048576 1048576 0.0### HUMC 0x0b200000-0x0b300000 1048576 1048576 1048576 0.0### HUMC 0x0b300000-0x0b400000 1048576 1048576 1048576 0.0### HUMC 0x0b400000-0x0b500000 1001480 1001480 1001480 0.0 These are the continuous set of regions called Humongous regions for storing a large object. HUMS (Humongous starts) marks the start of the set of humongous regions and HUMC (Humongous continues) tags the subsequent regions of the humongous regions set. ### SURV 0x09300000-0x09400000 16384 16384 16384 0.0 This is a Survivor region. ### SUMMARY capacity: 208.00 MB used: 150.16 MB / 72.19 % prev-live: 149.78 MB / 72.01 % next-live: 142.82 MB / 68.66 % At the end, a summary is printed listing the capacity, the used space and the change in the liveness after the completion of concurrent marking. In this case, G1 heap capacity is 208MB, total used space is 150.16MB which is 72.19% of the total heap size, live data in the previous marking was 149.78MB which was 72.01% of the total heap size and the live data as per the current marking is 142.82MB which is 68.66% of the total heap size. Option -XX:+G1PrintHeapRegions G1PrintHeapRegions option logs the regions related events when regions are committed, allocated into or are reclaimed. COMMIT/UNCOMMIT events G1HR COMMIT [0x6e900000,0x6ea00000]G1HR COMMIT [0x6ea00000,0x6eb00000] Here, the heap is being initialized or expanded and the region (with bottom: 0x6eb00000 and end: 0x6ec00000) is being freshly committed. COMMIT events are always generated in order i.e. the next COMMIT event will always be for the uncommitted region with the lowest address. G1HR UNCOMMIT [0x72700000,0x72800000]G1HR UNCOMMIT [0x72600000,0x72700000] Opposite to COMMIT. The heap got shrunk at the end of a Full GC and the regions are being uncommitted. Like COMMIT, UNCOMMIT events are also generated in order i.e. the next UNCOMMIT event will always be for the committed region with the highest address. GC Cycle events G1HR #StartGC 7G1HR CSET 0x6e900000G1HR REUSE 0x70500000G1HR ALLOC(Old) 0x6f800000G1HR RETIRE 0x6f800000 0x6f821b20G1HR #EndGC 7 This shows start and end of an Evacuation pause. This event is followed by a GC counter tracking both evacuation pauses and Full GCs. Here, this is the 7th GC since the start of the process. G1HR #StartFullGC 17G1HR UNCOMMIT [0x6ed00000,0x6ee00000]G1HR POST-COMPACTION(Old) 0x6e800000 0x6e854f58G1HR #EndFullGC 17 Shows start and end of a Full GC. This event is also followed by the same GC counter as above. This is the 17th GC since the start of the process. ALLOC events G1HR ALLOC(Eden) 0x6e800000 The region with bottom 0x6e800000 just started being used for allocation. In this case it is an Eden region and allocated into by a mutator thread. G1HR ALLOC(StartsH) 0x6ec00000 0x6ed00000G1HR ALLOC(ContinuesH) 0x6ed00000 0x6e000000 Regions being used for the allocation of Humongous object. The object spans over two regions. G1HR ALLOC(SingleH) 0x6f900000 0x6f9eb010 Single region being used for the allocation of Humongous object. G1HR COMMIT [0x6ee00000,0x6ef00000]G1HR COMMIT [0x6ef00000,0x6f000000]G1HR COMMIT [0x6f000000,0x6f100000]G1HR COMMIT [0x6f100000,0x6f200000]G1HR ALLOC(StartsH) 0x6ee00000 0x6ef00000G1HR ALLOC(ContinuesH) 0x6ef00000 0x6f000000G1HR ALLOC(ContinuesH) 0x6f000000 0x6f100000G1HR ALLOC(ContinuesH) 0x6f100000 0x6f102010 Here, Humongous object allocation request could not be satisfied by the free committed regions that existed in the heap, so the heap needed to be expanded. Thus new regions are committed and then allocated into for the Humongous object. G1HR ALLOC(Old) 0x6f800000 Old region started being used for allocation during GC. G1HR ALLOC(Survivor) 0x6fa00000 Region being used for copying old objects into during a GC. Note that Eden and Humongous ALLOC events are generated outside the GC boundaries and Old and Survivor ALLOC events are generated inside the GC boundaries. Other Events G1HR RETIRE 0x6e800000 0x6e87bd98 Retire and stop using the region having bottom 0x6e800000 and top 0x6e87bd98 for allocation. Note that most regions are full when they are retired and we omit those events to reduce the output volume. A region is retired when another region of the same type is allocated or we reach the start or end of a GC(depending on the region). So for Eden regions: For example: 1. ALLOC(Eden) Foo2. ALLOC(Eden) Bar3. StartGC At point 2, Foo has just been retired and it was full. At point 3, Bar was retired and it was full. If they were not full when they were retired, we will have a RETIRE event: 1. ALLOC(Eden) Foo2. RETIRE Foo top3. ALLOC(Eden) Bar4. StartGC G1HR CSET 0x6e900000 Region (bottom: 0x6e900000) is selected for the Collection Set. The region might have been selected for the collection set earlier (i.e. when it was allocated). However, we generate the CSET events for all regions in the CSet at the start of a GC to make sure there's no confusion about which regions are part of the CSet. G1HR POST-COMPACTION(Old) 0x6e800000 0x6e839858 POST-COMPACTION event is generated for each non-empty region in the heap after a full compaction. A full compaction moves objects around, so we don't know what the resulting shape of the heap is (which regions were written to, which were emptied, etc.). To deal with this, we generate a POST-COMPACTION event for each non-empty region with its type (old/humongous) and the heap boundaries. At this point we should only have Old and Humongous regions, as we have collapsed the young generation, so we should not have eden and survivors. POST-COMPACTION events are generated within the Full GC boundary. G1HR CLEANUP 0x6f400000G1HR CLEANUP 0x6f300000G1HR CLEANUP 0x6f200000 These regions were found empty after remark phase of Concurrent Marking and are reclaimed shortly afterwards. G1HR #StartGC 5G1HR CSET 0x6f400000G1HR CSET 0x6e900000G1HR REUSE 0x6f800000 At the end of a GC we retire the old region we are allocating into. Given that its not full, we will carry on allocating into it during the next GC. This is what REUSE means. In the above case 0x6f800000 should have been the last region with an ALLOC(Old) event during the previous GC and should have been retired before the end of the previous GC. G1HR ALLOC-FORCE(Eden) 0x6f800000 A specialization of ALLOC which indicates that we have reached the max desired number of the particular region type (in this case: Eden), but we decided to allocate one more. Currently it's only used for Eden regions when we extend the young generation because we cannot do a GC as the GC-Locker is active. G1HR EVAC-FAILURE 0x6f800000 During a GC, we have failed to evacuate an object from the given region as the heap is full and there is no space left to copy the object. This event is generated within GC boundaries and exactly once for each region from which we failed to evacuate objects. When Heap Regions are reclaimed ? It is also worth mentioning when the heap regions in the G1 heap are reclaimed. All regions that are in the CSet (the ones that appear in CSET events) are reclaimed at the end of a GC. The exception to that are regions with EVAC-FAILURE events. All regions with CLEANUP events are reclaimed. After a Full GC some regions get reclaimed (the ones from which we moved the objects out). But that is not shown explicitly, instead the non-empty regions that are left in the heap are printed out with the POST-COMPACTION events.

    Read the article

  • gparted won't boot from liveCD

    - by ant2009
    Hello, Fedora 12 2.6.32.9-70.fc12.i686 I downloaded the gparted iso and burnt it to a CD. gparted-live-0.5.2-1 This GParted Live was created by: create-gparted-live -l en -b u -e e -d sid -m http://free.nchc.org.tw/debian -s http://free.nchc.org.tw/debian-security -g http://free.nchc.org.tw/drbl-core -n 2.6.32-3 -i 0.5.2-1 The files and folders: isolinux live syslinux COPYING GParted-Live-Version I put the disk in and reboot. Nothing happens. I just get a flashing cursor in the top left hand corner. I then have to switch of the power button to get it to reboot. I have done this a number of time and the results are the same. Can anyone tell me how to boot from the gparted liveCD? Many thanks for any suggestions,

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • OCS 2007 R2 User Properties Error Message

    - by BWCA
    When I attempted to configure one of our user’s Meeting settings using the Microsoft Office Communications Server 2007 R2 Administration Tool   I received an Validation failed – Validation failed with HRESULT = 0XC3EC7E02 dialog box error message. I received the same error message when I tried to configure the user’s Telephony and Other settings. Using ADSI Edit, I compared the settings of an user that I had no problems configuring and the user that I had problems configuring.  For the user I had problems configuring, I noticed a trailing space after the last phone number digit for the user’s msRTCSIP-Line attribute. After I removed the trailing space for the attribute and waited for Active Directory replication to complete, I was able to configure the user’s Meeting settings (and Telephony/Other settings) without any problems. If you get the error message, check your user’s msRTCSIP-xxxxx attributes in Active Directory using ADSI Edit for any trailing spaces, typos, or any other mistakes.

    Read the article

  • Wire Framing WP7 Apps With Cacoo

    - by Tim Murphy
    While looking for a free alternative to Sketchflow I landed on the Cacoo web site.  Any developer who decides to use the free Visual Studio tools may find themselves doing the same search.  The base functionality of Cacoo is free although there are certain features that have fees attached to them such as extended stencils and templates. Cacoo doesn’t seem to have a template for WP7.  It does have templates for iOS and Android development so I started with the Android template and started modidfying it for WP7.  Funny thing is since Android has the same hardware vendors as Windows Phone the basic frame looks just right (I would swear I was looking at my Samsung Focus). Below is the start of a new mockup for the user group that I help run. I found that while Cacoo doesn’t have all the icons I need I am able to insert them from the Windows Phone Toolkit folder.  If I put them off to the side as you can see above.  I can simply copy and paste them into the appropriate place as needed.  Beyond that I have customized the main frame frame so I can have my base to work from.  In the future I intend to create this as a stencil and if it looks good enough I would consider making it public. My use of this product is still in it’s early phase, but it seems like a great way to start.  Maybe if you use this to get going you can earn enough from your resulting apps to pay for something with more bells and whistles in the future. del.icio.us Tags: WP7,Windows Phone 7 development,design,Cacoo,wire frame

    Read the article

  • Community Events and Workshops in November 2012 #ssas #tabular #powerpivot

    - by Marco Russo (SQLBI)
    I and Alberto have a busy agenda until the end of the month, but if you are based in Northern Europe there are many chance to meet one of us in the next couple of weeks! Belgium, 20 November 2012 – SQL Server Days 2012 with Marco Russo I will present two sessions in this conference, “Data Modeling for Tabular” and “Querying and Optimizing DAX” Copenhagen, 21-22 November, 2012 – SSAS Tabular Workshop with Alberto Ferrari Alberto will be the speaker for 2 days – you can still register if you want a full immersion! Copenhagen, 21 November 2012 – Free Community Event with Alberto Ferrari (hosted in Microsoft Hellerup) In the evening Alberto will present “Excel 2013 PowerPivot in Action” Munich, 27-28 November 2012 - SSAS Tabular Workshop with Alberto Ferrari The SSAS workshop will run also in Germany, this time in Munich. Also here there is still some seat still available. Munich, 27 November 2012 - Free Community Event with Alberto Ferrari (hosted in Microsoft ) In the evening Alberto will present “Excel 2013 PowerPivot in Action” Moscow, 27-28 November 2012 – TechEd Russia 2012 with Marco Russo I will speak during the keynote on November 27 and I will present two session the day after, “Developing an Analysis Services Tabular Project BI Semantic Model” and “Excel 2013 PowerPivot in Action” Stockholm, 29-30 November 2012 - SSAS Tabular Workshop with Marco Russo I will run this workshop in Stockholm – if you want to register here, hurry up! Few seats still available! Stockholm, 29 November 2012 - Free Community Event with Marco Russo In the evening I will present “Excel 2013 PowerPivot in Action” If you want to attend a SSAS Tabular Workshop online, you can also register to the Online edition of December 5-6, 2012, which is still in early bird and is scheduled with a friendly time zone for America’s countries (which could be good for Europe too, in case you don’t mind attending a workshop until midnight!).

    Read the article

  • Selling Android apps from Latvia? or should I just put banners?

    - by Roger Travis
    I am in Latvia ( which is not supported to sell apps at android market ), so I am thinking about the best way of monetizing my app. So far I've come up with such options: somehow imitate that I am from a supported country, get a bank account there, etc. use PayPal for in-app purchases. The player get, say, first 10 levels for free, but then is asked to pay 0.99$ for the rest of the game. downsides: player might not feel comfortable entering his paypal details into an app. also android market might not really like that. making the app free and get money from advertising... let's do some calculation here, say, I get 1m free downloads, each user during his playtime would see 10 banners, therefor 10m / 1000 * 0.3 = gives roughly 33k$ ( if we use adMob with their 0.3$ per 1000 impressions ). On the other hand, if we use paypal in app purchase, we need a 3% or more conversion rate to beat this... hmm... What do you think about all this? Thanks! edit: from what I just read all over the net, it looks like advertisers will change their eCPM price a lot without you understanding why... while using in-app paypal purchase you can at least somehow monitor the cashflow.

    Read the article

  • LastPass Now Monitors Your Accounts for Security Breaches

    - by Jason Fitzpatrick
    Staying on top of security breaches and how they may or may not affect you is time consuming. Sentry, a new and free addition to the LastPass password management tool, automates the process and notifies you of breaches. In response to all the recent and unfortunate high-profile security breaches LastPass has rolled out Sentry–a tool that monitors breach lists to notify you if your email appears in a list of breached accounts. The lists are supplied by PwnedList, a massive database of security breach data, and securely indexed against your accounts within the LastPass system. If there is a security breach and your email is on the list, you’ll receive an automated email notice indicating which website was compromised and that your email address was one of the positive matches from the breach list. LastPass Sentry is a free feature and, as of yesterday, is automatically activated on all Free, Premium, and Enterprise level accounts. Hit up the link below to read the official announcement. Introducing LastPass Sentry [The LastPass Blog] How To Create a Customized Windows 7 Installation Disc With Integrated Updates How to Get Pro Features in Windows Home Versions with Third Party Tools HTG Explains: Is ReadyBoost Worth Using?

    Read the article

  • Temporarily share/deploy a python (flask) application

    - by Jeff
    Goal Temporarily (1 month?) deploy/share a python (flask) web app without expensive/complex hosting. More info I've developed a basic mobile web app for the non-profit I work for. It's written in python and uses flask as its framework. I'd like to share this with other employees and beta testers (<25 people). Ideally, I could get some sort of simple hosting space/service and push regular updates to it while we test and iterate on this app. Think something along the lines of dropbox, which of course would not work for this purpose. We do have a website, and hosting services for it, but I'm concerned about using this resource as our website is mission critical and this app is very much pre-alpha at this point. Options I've researched / considered Self host from local machine/network (slow, unreliable) Purchase hosting space (with limited non-profit resources, I'm concerned this is overkill) Using our current web server / hosting (not appropriate for testing) Thanks very much for your time!

    Read the article

  • Licensing a JavaScript library

    - by Kendall Frey
    I am developing a free, open-source (duh) JavaScript library, and wondering how to license it. I was considering the GNU GPL, but I heard that I must distribute the license with the software, and I'm not sure anymore. I would like the library to be available much like jQuery: In a free, downloadable script, preferably in either original or minified form. Am I mistaken about the GNU GPL license terms? jQuery is dual licensed under GNU GPL or MIT licenses. How does the GPL apply to single script files like that? Can I license my library with nothing more than a few sentences in the script file? Is there another license that better suits my needs? What would be nice is a license that allows you to put the URL in the source, for people to read if they want. I don't know that many do, unless I am mistaken. I am generally looking to release the library as free software like the GPL specifies, but don't want to have to force licensees to download the full license unless they wish to read it.

    Read the article

  • Oracle Virtual Developer Day: Oracle Fusion Development

    - by rituchhibber
    Get up to date and learn everything you wanted to know about Oracle ADF & Fusion Development plus live Q&A chats with Oracle technical staff. Oracle Application Development Framework (ADF) is the standards based, strategic framework for Oracle Fusion Applications and Oracle Fusion Middleware. Oracle ADF's integration with the Oracle SOA Suite, Oracle WebCenter and Oracle BI creates a complete productive development platform for your custom applications. Join us at this FREE virtual event and learn the latest in Fusion Development including: Is Oracle ADF development faster and simpler than Forms, Apex or .Net? Mobile Application Development with ADF Mobile Oracle ADF development with Eclipse Oracle WebCenter Portal and ADF Development Application Lifecycle Management with ADF Building Process Centric Applications with ADF and BPM Oracle Business Intelligence and ADF Integration Live Q&A chats with Oracle technical staff Developer lead, manager or architect - this event has something for everyone. Don't miss this opportunity. December 11th, 2012, at 9:00 – 13:00 GMT/ 10:00 - 14:00 CET Register online now for this FREE event! Agenda 9:00 am - 9:30 am Opening 9:30 am - 10:00 am Keynote Oracle Fusion Development Track 1 Introduction to Fusion Development Track 2 What's New in Fusion Development Track 3 Fusion Development in the Enterprise Track 4 Hands On Lab - WebCenter Portal and ADF Lab w/ JDeveloper 10:00 am - 11:00 am Is Oracle ADF development faster and simpler than Forms, Apex or .Net? Mobile Application Development with ADF Mobile Oracle WebCenter Portal and ADF Development Lab materials can be found on event wiki here. Q&A about the lab is available throughout the event. 11:00 am - 12:00 pm Rich Web UI made simple - an ADF Faces Overview Oracle Enterprise Pack for Eclipse - ADF Development Building Process Centric Applications with ADF and BPM 12:00 pm - 1:00 pm Next Generation Controller for JSF Application Lifecycle Management for ADF Oracle Business Intelligence and ADF Integration Click here to view session Abstracts. We look forward to welcoming you at this free event!

    Read the article

< Previous Page | 158 159 160 161 162 163 164 165 166 167 168 169  | Next Page >