Search Results

Search found 13829 results on 554 pages for 'temporary objects'.

Page 115/554 | < Previous Page | 111 112 113 114 115 116 117 118 119 120 121 122 | Next Page >

Improving Partitioned Table Join Performance

- by Paul White

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) ); CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY); WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50); -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE); -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory: Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); CREATE TABLE #P ( partition_number integer PRIMARY KEY); INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1; SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; DROP TABLE #P; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article
SQL Monitor’s data repository

- by Chris Lambrou

As one of the developers of SQL Monitor, I often get requests passed on by our support people from customers who are looking to dip into SQL Monitor’s own data repository, in order to pull out bits of information that they’re interested in. Since there’s clearly interest out there in playing around directly with the data repository, I thought I’d write some blog posts to start to describe how it all works. The hardest part for me is knowing where to begin, since the schema of the data repository is pretty big. Hmmm… I guess it’s tricky for anyone to write anything but the most trivial of queries against the data repository without understanding the hierarchy of monitored objects, so perhaps my first post should start there. I always imagine that whenever a customer fires up SSMS and starts to explore their SQL Monitor data repository database, they become immediately bewildered by the schema – that was certainly my experience when I did so for the first time. The following query shows the number of different object types in the data repository schema: SELECT type_desc, COUNT(*) AS [count] FROM sys.objects GROUP BY type_desc ORDER BY type_desc; type_desccount 1DEFAULT_CONSTRAINT63 2FOREIGN_KEY_CONSTRAINT181 3INTERNAL_TABLE3 4PRIMARY_KEY_CONSTRAINT190 5SERVICE_QUEUE3 6SQL_INLINE_TABLE_VALUED_FUNCTION381 7SQL_SCALAR_FUNCTION2 8SQL_STORED_PROCEDURE100 9SYSTEM_TABLE41 10UNIQUE_CONSTRAINT54 11USER_TABLE193 12VIEW124 With 193 tables, 124 views, 100 stored procedures and 381 table valued functions, that’s quite a hefty schema, and when you browse through it using SSMS, it can be a bit daunting at first. So, where to begin? Well, let’s narrow things down a bit and only look at the tables belonging to the data schema. That’s where all of the collected monitoring data is stored by SQL Monitor. The following query gives us the names of those tables: SELECT sch.name + '.' + obj.name AS [name] FROM sys.objects obj JOIN sys.schemas sch ON sch.schema_id = obj.schema_id WHERE obj.type_desc = 'USER_TABLE' AND sch.name = 'data' ORDER BY sch.name, obj.name; This query still returns 110 tables. I won’t show them all here, but let’s have a look at the first few of them: name 1data.Cluster_Keys 2data.Cluster_Machine_ClockSkew_UnstableSamples 3data.Cluster_Machine_Cluster_StableSamples 4data.Cluster_Machine_Keys 5data.Cluster_Machine_LogicalDisk_Capacity_StableSamples 6data.Cluster_Machine_LogicalDisk_Keys 7data.Cluster_Machine_LogicalDisk_Sightings 8data.Cluster_Machine_LogicalDisk_UnstableSamples 9data.Cluster_Machine_LogicalDisk_Volume_StableSamples 10data.Cluster_Machine_Memory_Capacity_StableSamples 11data.Cluster_Machine_Memory_UnstableSamples 12data.Cluster_Machine_Network_Capacity_StableSamples 13data.Cluster_Machine_Network_Keys 14data.Cluster_Machine_Network_Sightings 15data.Cluster_Machine_Network_UnstableSamples 16data.Cluster_Machine_OperatingSystem_StableSamples 17data.Cluster_Machine_Ping_UnstableSamples 18data.Cluster_Machine_Process_Instances 19data.Cluster_Machine_Process_Keys 20data.Cluster_Machine_Process_Owner_Instances 21data.Cluster_Machine_Process_Sightings 22data.Cluster_Machine_Process_UnstableSamples 23… There are two things I want to draw your attention to: The table names describe a hierarchy of the different types of object that are monitored by SQL Monitor (e.g. clusters, machines and disks). For each object type in the hierarchy, there are multiple tables, ending in the suffixes _Keys, _Sightings, _StableSamples and _UnstableSamples. Not every object type has a table for every suffix, but the _Keys suffix is especially important and a _Keys table does indeed exist for every object type. In fact, if we limit the query to return only those tables ending in _Keys, we reveal the full object hierarchy: SELECT sch.name + '.' + obj.name AS [name] FROM sys.objects obj JOIN sys.schemas sch ON sch.schema_id = obj.schema_id WHERE obj.type_desc = 'USER_TABLE' AND sch.name = 'data' AND obj.name LIKE '%_Keys' ORDER BY sch.name, obj.name; name 1data.Cluster_Keys 2data.Cluster_Machine_Keys 3data.Cluster_Machine_LogicalDisk_Keys 4data.Cluster_Machine_Network_Keys 5data.Cluster_Machine_Process_Keys 6data.Cluster_Machine_Services_Keys 7data.Cluster_ResourceGroup_Keys 8data.Cluster_ResourceGroup_Resource_Keys 9data.Cluster_SqlServer_Agent_Job_History_Keys 10data.Cluster_SqlServer_Agent_Job_Keys 11data.Cluster_SqlServer_Database_BackupType_Backup_Keys 12data.Cluster_SqlServer_Database_BackupType_Keys 13data.Cluster_SqlServer_Database_CustomMetric_Keys 14data.Cluster_SqlServer_Database_File_Keys 15data.Cluster_SqlServer_Database_Keys 16data.Cluster_SqlServer_Database_Table_Index_Keys 17data.Cluster_SqlServer_Database_Table_Keys 18data.Cluster_SqlServer_Error_Keys 19data.Cluster_SqlServer_Keys 20data.Cluster_SqlServer_Services_Keys 21data.Cluster_SqlServer_SqlProcess_Keys 22data.Cluster_SqlServer_TopQueries_Keys 23data.Cluster_SqlServer_Trace_Keys 24data.Group_Keys The full object type hierarchy looks like this: Cluster Machine LogicalDisk Network Process Services ResourceGroup Resource SqlServer Agent Job History Database BackupType Backup CustomMetric File Table Index Error Services SqlProcess TopQueries Trace Group Okay, but what about the individual objects themselves represented at each level in this hierarchy? Well that’s what the _Keys tables are for. This is probably best illustrated by way of a simple example – how can I query my own data repository to find the databases on my own PC for which monitoring data has been collected? Like this: SELECT clstr._Name AS cluster_name, srvr._Name AS instance_name, db._Name AS database_name FROM data.Cluster_SqlServer_Database_Keys db JOIN data.Cluster_SqlServer_Keys srvr ON db.ParentId = srvr.Id -- Note here how the parent of a Database is a Server JOIN data.Cluster_Keys clstr ON srvr.ParentId = clstr.Id -- Note here how the parent of a Server is a Cluster WHERE clstr._Name = 'dev-chrisl2' -- This is the hostname of my own PC ORDER BY clstr._Name, srvr._Name, db._Name; cluster_nameinstance_namedatabase_name 1dev-chrisl2SqlMonitorData 2dev-chrisl2master 3dev-chrisl2model 4dev-chrisl2msdb 5dev-chrisl2mssqlsystemresource 6dev-chrisl2tempdb 7dev-chrisl2sql2005SqlMonitorData 8dev-chrisl2sql2005TestDatabase 9dev-chrisl2sql2005master 10dev-chrisl2sql2005model 11dev-chrisl2sql2005msdb 12dev-chrisl2sql2005mssqlsystemresource 13dev-chrisl2sql2005tempdb 14dev-chrisl2sql2008SqlMonitorData 15dev-chrisl2sql2008master 16dev-chrisl2sql2008model 17dev-chrisl2sql2008msdb 18dev-chrisl2sql2008mssqlsystemresource 19dev-chrisl2sql2008tempdb These results show that I have three SQL Server instances on my machine (a default instance, one named sql2005 and one named sql2008), and each instance has the usual set of system databases, along with a database named SqlMonitorData. Basically, this is where I test SQL Monitor on different versions of SQL Server, when I’m developing. There are a few important things we can learn from this query: Each _Keys table has a column named Id. This is the primary key. Each _Keys table has a column named ParentId. A foreign key relationship is defined between each _Keys table and its parent _Keys table in the hierarchy. There are two exceptions to this, Cluster_Keys and Group_Keys, because clusters and groups live at the root level of the object hierarchy. Each _Keys table has a column named _Name. This is used to uniquely identify objects in the table within the scope of the same shared parent object. Actually, that last item isn’t always true. In some cases, the _Name column is actually called something else. For example, the data.Cluster_Machine_Services_Keys table has a column named _ServiceName instead of _Name (sorry for the inconsistency). In other cases, a name isn’t sufficient to uniquely identify an object. For example, right now my PC has multiple processes running, all sharing the same name, Chrome (one for each tab open in my web-browser). In such cases, multiple columns are used to uniquely identify an object within the scope of the same shared parent object. Well, that’s it for now. I’ve given you enough information for you to explore the _Keys tables to see how objects are stored in your own data repositories. In a future post, I’ll try to explain how monitoring data is stored for each object, using the _StableSamples and _UnstableSamples tables. If you have any questions about this post, or suggestions for future posts, just submit them in the comments section below.

Read the article
How accurate is "Business logic should be in a service, not in a model"?

- by Jeroen Vannevel

Situation Earlier this evening I gave an answer to a question on StackOverflow. The question: Editing of an existing object should be done in repository layer or in service? For example if I have a User that has debt. I want to change his debt. Should I do it in UserRepository or in service for example BuyingService by getting an object, editing it and saving it ? My answer: You should leave the responsibility of mutating an object to that same object and use the repository to retrieve this object. Example situation: class User { private int debt; // debt in cents private string name; // getters public void makePayment(int cents){ debt -= cents; } } class UserRepository { public User GetUserByName(string name){ // Get appropriate user from database } } A comment I received: Business logic should really be in a service. Not in a model. What does the internet say? So, this got me searching since I've never really (consciously) used a service layer. I started reading up on the Service Layer pattern and the Unit Of Work pattern but so far I can't say I'm convinced a service layer has to be used. Take for example this article by Martin Fowler on the anti-pattern of an Anemic Domain Model: There are objects, many named after the nouns in the domain space, and these objects are connected with the rich relationships and structure that true domain models have. The catch comes when you look at the behavior, and you realize that there is hardly any behavior on these objects, making them little more than bags of getters and setters. Indeed often these models come with design rules that say that you are not to put any domain logic in the the domain objects. Instead there are a set of service objects which capture all the domain logic. These services live on top of the domain model and use the domain model for data. (...) The logic that should be in a domain object is domain logic - validations, calculations, business rules - whatever you like to call it. To me, this seemed exactly what the situation was about: I advocated the manipulation of an object's data by introducing methods inside that class that do just that. However I realize that this should be a given either way, and it probably has more to do with how these methods are invoked (using a repository). I also had the feeling that in that article (see below), a Service Layer is more considered as a façade that delegates work to the underlying model, than an actual work-intensive layer. Application Layer [his name for Service Layer]: Defines the jobs the software is supposed to do and directs the expressive domain objects to work out problems. The tasks this layer is responsible for are meaningful to the business or necessary for interaction with the application layers of other systems. This layer is kept thin. It does not contain business rules or knowledge, but only coordinates tasks and delegates work to collaborations of domain objects in the next layer down. It does not have state reflecting the business situation, but it can have state that reflects the progress of a task for the user or the program. Which is reinforced here: Service interfaces. Services expose a service interface to which all inbound messages are sent. You can think of a service interface as a façade that exposes the business logic implemented in the application (typically, logic in the business layer) to potential consumers. And here: The service layer should be devoid of any application or business logic and should focus primarily on a few concerns. It should wrap Business Layer calls, translate your Domain in a common language that your clients can understand, and handle the communication medium between server and requesting client. This is a serious contrast to other resources that talk about the Service Layer: The service layer should consist of classes with methods that are units of work with actions that belong in the same transaction. Or the second answer to a question I've already linked: At some point, your application will want some business logic. Also, you might want to validate the input to make sure that there isn't something evil or nonperforming being requested. This logic belongs in your service layer. "Solution"? Following the guidelines in this answer, I came up with the following approach that uses a Service Layer: class UserController : Controller { private UserService _userService; public UserController(UserService userService){ _userService = userService; } public ActionResult MakeHimPay(string username, int amount) { _userService.MakeHimPay(username, amount); return RedirectToAction("ShowUserOverview"); } public ActionResult ShowUserOverview() { return View(); } } class UserService { private IUserRepository _userRepository; public UserService(IUserRepository userRepository) { _userRepository = userRepository; } public void MakeHimPay(username, amount) { _userRepository.GetUserByName(username).makePayment(amount); } } class UserRepository { public User GetUserByName(string name){ // Get appropriate user from database } } class User { private int debt; // debt in cents private string name; // getters public void makePayment(int cents){ debt -= cents; } } Conclusion All together not much has changed here: code from the controller has moved to the service layer (which is a good thing, so there is an upside to this approach). However this doesn't look like it had anything to do with my original answer. I realize design patterns are guidelines, not rules set in stone to be implemented whenever possible. Yet I have not found a definitive explanation of the service layer and how it should be regarded. Is it a means to simply extract logic from the controller and put it inside a service instead? Is it supposed to form a contract between the controller and the domain? Should there be a layer between the domain and the service layer? And, last but not least: following the original comment Business logic should really be in a service. Not in a model. Is this correct? How would I introduce my business logic in a service instead of the model?

Read the article
creating objects from trivial graph format text file. java. dijkstra algorithm.

- by user560084

i want to create objects, vertex and edge, from trivial graph format txt file. one of programmers here suggested that i use trivial graph format to store data for dijkstra algorithm. the problem is that at the moment all the information, e.g., weight, links, is in the sourcecode. i want to have a separate text file for that and read it into the program. i thought about using a code for scanning through the text file by using scanner. but i am not quite sure how to create different objects from the same file. could i have some help please? the file is v0 Harrisburg v1 Baltimore v2 Washington v3 Philadelphia v4 Binghamton v5 Allentown v6 New York # v0 v1 79.83 v0 v5 81.15 v1 v0 79.75 v1 v2 39.42 v1 v3 103.00 v2 v1 38.65 v3 v1 102.53 v3 v5 61.44 v3 v6 96.79 v4 v5 133.04 v5 v0 81.77 v5 v3 62.05 v5 v4 134.47 v5 v6 91.63 v6 v3 97.24 v6 v5 87.94 and the dijkstra algorithm code is Downloaded from: http://en.literateprograms.org/Special:Downloadcode/Dijkstra%27s_algorithm_%28Java%29 */ import java.util.PriorityQueue; import java.util.List; import java.util.ArrayList; import java.util.Collections; class Vertex implements Comparable<Vertex> { public final String name; public Edge[] adjacencies; public double minDistance = Double.POSITIVE_INFINITY; public Vertex previous; public Vertex(String argName) { name = argName; } public String toString() { return name; } public int compareTo(Vertex other) { return Double.compare(minDistance, other.minDistance); } } class Edge { public final Vertex target; public final double weight; public Edge(Vertex argTarget, double argWeight) { target = argTarget; weight = argWeight; } } public class Dijkstra { public static void computePaths(Vertex source) { source.minDistance = 0.; PriorityQueue<Vertex> vertexQueue = new PriorityQueue<Vertex>(); vertexQueue.add(source); while (!vertexQueue.isEmpty()) { Vertex u = vertexQueue.poll(); // Visit each edge exiting u for (Edge e : u.adjacencies) { Vertex v = e.target; double weight = e.weight; double distanceThroughU = u.minDistance + weight; if (distanceThroughU < v.minDistance) { vertexQueue.remove(v); v.minDistance = distanceThroughU ; v.previous = u; vertexQueue.add(v); } } } } public static List<Vertex> getShortestPathTo(Vertex target) { List<Vertex> path = new ArrayList<Vertex>(); for (Vertex vertex = target; vertex != null; vertex = vertex.previous) path.add(vertex); Collections.reverse(path); return path; } public static void main(String[] args) { Vertex v0 = new Vertex("Nottinghill_Gate"); Vertex v1 = new Vertex("High_Street_kensignton"); Vertex v2 = new Vertex("Glouchester_Road"); Vertex v3 = new Vertex("South_Kensignton"); Vertex v4 = new Vertex("Sloane_Square"); Vertex v5 = new Vertex("Victoria"); Vertex v6 = new Vertex("Westminster"); v0.adjacencies = new Edge[]{new Edge(v1, 79.83), new Edge(v6, 97.24)}; v1.adjacencies = new Edge[]{new Edge(v2, 39.42), new Edge(v0, 79.83)}; v2.adjacencies = new Edge[]{new Edge(v3, 38.65), new Edge(v1, 39.42)}; v3.adjacencies = new Edge[]{new Edge(v4, 102.53), new Edge(v2, 38.65)}; v4.adjacencies = new Edge[]{new Edge(v5, 133.04), new Edge(v3, 102.53)}; v5.adjacencies = new Edge[]{new Edge(v6, 81.77), new Edge(v4, 133.04)}; v6.adjacencies = new Edge[]{new Edge(v0, 97.24), new Edge(v5, 81.77)}; Vertex[] vertices = { v0, v1, v2, v3, v4, v5, v6 }; computePaths(v0); for (Vertex v : vertices) { System.out.println("Distance to " + v + ": " + v.minDistance); List<Vertex> path = getShortestPathTo(v); System.out.println("Path: " + path); } } } and the code for scanning file is import java.util.Scanner; import java.io.File; import java.io.FileNotFoundException; public class DataScanner1 { //private int total = 0; //private int distance = 0; private String vector; private String stations; private double [] Edge = new double []; /*public int getTotal(){ return total; } */ /* public void getMenuInput(){ KeyboardInput in = new KeyboardInput; System.out.println("Enter the destination? "); String val = in.readString(); return val; } */ public void readFile(String fileName) { try { Scanner scanner = new Scanner(new File(fileName)); scanner.useDelimiter (System.getProperty("line.separator")); while (scanner.hasNext()) { parseLine(scanner.next()); } scanner.close(); } catch (FileNotFoundException e) { e.printStackTrace(); } } public void parseLine(String line) { Scanner lineScanner = new Scanner(line); lineScanner.useDelimiter("\\s*,\\s*"); vector = lineScanner.next(); stations = lineScanner.next(); System.out.println("The current station is " + vector + " and the destination to the next station is " + stations + "."); //total += distance; //System.out.println("The total distance is " + total); } public static void main(String[] args) { /* if (args.length != 1) { System.err.println("usage: java TextScanner2" + "file location"); System.exit(0); } */ DataScanner1 scanner = new DataScanner1(); scanner.readFile(args[0]); //int total =+ distance; //System.out.println(""); //System.out.println("The total distance is " + scanner.getTotal()); } }

Read the article
How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article
I am having a problem of class cast exception. Can anyone please help me out?

- by Piyush

This is my code: package com.example.userpage; import android.app.Activity; import android.content.Intent; import android.os.Bundle; import android.view.View; import android.widget.Button; import android.widget.EditText; import android.widget.TextView; public class UserPage extends Activity { String tv,tv1; EditText name,pass; TextView x,y; /** Called when the activity is first created. */ @Override public void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.main); Button button = (Button) findViewById(R.id.widget44); button.setOnClickListener(new View.OnClickListener() { public void onClick(View v) { name.setText(" "); pass.setText(" "); } }); x = (TextView) findViewById(R.id.widget46); y = (TextView) findViewById(R.id.widget47); name = (EditText)findViewById(R.id.widget41); pass = (EditText)findViewById(R.id.widget42); Button button1 = (Button) findViewById(R.id.widget45); button1.setOnClickListener(new View.OnClickListener() { public void onClick(View v) { tv= name.getText().toString(); tv1 = pass.getText().toString(); x.setText(tv); y.setText(tv1); } }); } } And this is my log cat: 02-16 12:24:30.488: DEBUG/AndroidRuntime(973): >>>>>>>>>>>>>> AndroidRuntime START <<<<<<<<<<<<<< 02-16 12:24:30.488: DEBUG/AndroidRuntime(973): CheckJNI is ON 02-16 12:24:31.208: DEBUG/AndroidRuntime(973): --- registering native functions --- 02-16 12:24:33.498: DEBUG/AndroidRuntime(973): Shutting down VM 02-16 12:24:33.537: DEBUG/dalvikvm(973): Debugger has detached; object registry had 1 entries 02-16 12:24:33.537: INFO/AndroidRuntime(973): NOTE: attach of thread 'Binder Thread #3' failed 02-16 12:24:34.917: DEBUG/AndroidRuntime(981): >>>>>>>>>>>>>> AndroidRuntime START <<<<<<<<<<<<<< 02-16 12:24:34.927: DEBUG/AndroidRuntime(981): CheckJNI is ON 02-16 12:24:35.617: DEBUG/AndroidRuntime(981): --- registering native functions --- 02-16 12:24:38.029: INFO/ActivityManager(67): Starting activity: Intent { act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10000000 cmp=com.example.userpage/.UserPage } 02-16 12:24:38.129: DEBUG/AndroidRuntime(981): Shutting down VM 02-16 12:24:38.160: DEBUG/dalvikvm(981): Debugger has detached; object registry had 1 entries 02-16 12:24:38.168: INFO/AndroidRuntime(981): NOTE: attach of thread 'Binder Thread #3' failed 02-16 12:25:12.028: DEBUG/AndroidRuntime(990): >>>>>>>>>>>>>> AndroidRuntime START <<<<<<<<<<<<<< 02-16 12:25:12.038: DEBUG/AndroidRuntime(990): CheckJNI is ON 02-16 12:25:12.708: DEBUG/AndroidRuntime(990): --- registering native functions --- 02-16 12:25:15.178: DEBUG/dalvikvm(176): GC_EXPLICIT freed 114 objects / 5880 bytes in 115ms 02-16 12:25:15.318: DEBUG/PackageParser(67): Scanning package: /data/app/vmdl25170.tmp 02-16 12:25:15.588: INFO/PackageManager(67): Removing non-system package:com.example.userpage 02-16 12:25:15.597: INFO/ActivityManager(67): Force stopping package com.example.userpage uid=10036 02-16 12:25:15.648: INFO/Process(67): Sending signal. PID: 916 SIG: 9 02-16 12:25:15.877: INFO/UsageStats(67): Unexpected resume of com.android.launcher while already resumed in com.example.userpage 02-16 12:25:17.028: WARN/InputManagerService(67): Window already focused, ignoring focus gain of: com.android.internal.view.IInputMethodClient$Stub$Proxy@4400ecf8 02-16 12:25:17.928: DEBUG/PackageManager(67): Scanning package com.example.userpage 02-16 12:25:17.949: INFO/PackageManager(67): Package com.example.userpage codePath changed from /data/app/com.example.userpage-1.apk to /data/app/com.example.userpage-2.apk; Retaining data and using new 02-16 12:25:17.987: INFO/PackageManager(67): /data/app/com.example.userpage-2.apk changed; unpacking 02-16 12:25:18.037: DEBUG/installd(35): DexInv: --- BEGIN '/data/app/com.example.userpage-2.apk' --- 02-16 12:25:18.737: DEBUG/dalvikvm(997): DexOpt: load 81ms, verify 112ms, opt 6ms 02-16 12:25:18.768: DEBUG/installd(35): DexInv: --- END '/data/app/com.example.userpage-2.apk' (success) --- 02-16 12:25:18.799: INFO/ActivityManager(67): Force stopping package com.example.userpage uid=10036 02-16 12:25:18.808: WARN/PackageManager(67): Code path for pkg : com.example.userpage changing from /data/app/com.example.userpage-1.apk to /data/app/com.example.userpage-2.apk 02-16 12:25:18.839: WARN/PackageManager(67): Resource path for pkg : com.example.userpage changing from /data/app/com.example.userpage-1.apk to /data/app/com.example.userpage-2.apk 02-16 12:25:18.868: DEBUG/PackageManager(67): Activities: com.example.userpage.UserPage 02-16 12:25:19.297: INFO/installd(35): move /data/dalvik-cache/data@[email protected]@classes.dex -> /data/dalvik-cache/data@[email protected]@classes.dex 02-16 12:25:19.297: DEBUG/PackageManager(67): New package installed in /data/app/com.example.userpage-2.apk 02-16 12:25:19.598: DEBUG/dalvikvm(67): GC_FOR_MALLOC freed 7979 objects / 516856 bytes in 246ms 02-16 12:25:20.498: INFO/ActivityManager(67): Force stopping package com.example.userpage uid=10036 02-16 12:25:20.708: DEBUG/dalvikvm(129): GC_EXPLICIT freed 124 objects / 5672 bytes in 157ms 02-16 12:25:21.838: DEBUG/dalvikvm(67): GC_EXPLICIT freed 4208 objects / 236264 bytes in 419ms 02-16 12:25:21.918: WARN/RecognitionManagerService(67): no available voice recognition services found 02-16 12:25:22.127: INFO/installd(35): unlink /data/dalvik-cache/data@[email protected]@classes.dex 02-16 12:25:22.478: DEBUG/AndroidRuntime(990): Shutting down VM 02-16 12:25:22.488: DEBUG/dalvikvm(990): Debugger has detached; object registry had 1 entries 02-16 12:25:22.588: INFO/AndroidRuntime(990): NOTE: attach of thread 'Binder Thread #3' failed 02-16 12:25:24.137: DEBUG/AndroidRuntime(1003): >>>>>>>>>>>>>> AndroidRuntime START <<<<<<<<<<<<<< 02-16 12:25:24.147: DEBUG/AndroidRuntime(1003): CheckJNI is ON 02-16 12:25:24.817: DEBUG/AndroidRuntime(1003): --- registering native functions --- 02-16 12:25:27.450: INFO/ActivityManager(67): Starting activity: Intent { act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10000000 cmp=com.example.userpage/.UserPage } 02-16 12:25:27.628: DEBUG/AndroidRuntime(1003): Shutting down VM 02-16 12:25:27.780: INFO/AndroidRuntime(1003): NOTE: attach of thread 'Binder Thread #3' failed 02-16 12:25:28.018: DEBUG/dalvikvm(1003): Debugger has detached; object registry had 1 entries 02-16 12:25:28.148: INFO/ActivityManager(67): Start proc com.example.userpage for activity com.example.userpage/.UserPage: pid=1010 uid=10036 gids={} 02-16 12:25:30.308: DEBUG/AndroidRuntime(1010): Shutting down VM 02-16 12:25:30.308: WARN/dalvikvm(1010): threadid=1: thread exiting with uncaught exception (group=0x4001d800) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): FATAL EXCEPTION: main 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): java.lang.RuntimeException: Unable to start activity ComponentInfo{com.example.userpage/com.example.userpage.UserPage}: java.lang.ClassCastException: android.widget.TextView 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2663) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2679) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): at android.app.ActivityThread.access$2300(ActivityThread.java:125) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2033) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): at android.os.Handler.dispatchMessage(Handler.java:99) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): at android.os.Looper.loop(Looper.java:123) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): at android.app.ActivityThread.main(ActivityThread.java:4627) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): at java.lang.reflect.Method.invokeNative(Native Method) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): at java.lang.reflect.Method.invoke(Method.java:521) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:868) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:626) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): at dalvik.system.NativeStart.main(Native Method) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): Caused by: java.lang.ClassCastException: android.widget.TextView 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): at com.example.userpage.UserPage.onCreate(UserPage.java:35) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1047) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2627) 02-16 12:25:30.388: ERROR/AndroidRuntime(1010): ... 11 more 02-16 12:25:30.438: WARN/ActivityManager(67): Force finishing activity com.example.userpage/.UserPage 02-16 12:25:31.088: WARN/ActivityManager(67): Activity pause timeout for HistoryRecord{43f164f8 com.example.userpage/.UserPage} 02-16 12:25:32.588: DEBUG/dalvikvm(292): GC_EXPLICIT freed 46 objects / 2240 bytes in 6282ms 02-16 12:25:35.267: INFO/Process(1010): Sending signal. PID: 1010 SIG: 9 02-16 12:25:35.468: WARN/InputManagerService(67): Window already focused, ignoring focus gain of: com.android.internal.view.IInputMethodClient$Stub$Proxy@43e60a90 02-16 12:25:35.900: INFO/ActivityManager(67): Process com.example.userpage (pid 1010) has died. 02-16 12:25:38.278: DEBUG/dalvikvm(176): GC_EXPLICIT freed 172 objects / 12280 bytes in 127ms 02-16 12:25:43.011: WARN/ActivityManager(67): Activity destroy timeout for HistoryRecord{43f164f8 com.example.userpage/.UserPage} 02-16 12:28:12.698: DEBUG/AndroidRuntime(1019): >>>>>>>>>>>>>> AndroidRuntime START <<<<<<<<<<<<<< 02-16 12:28:12.711: DEBUG/AndroidRuntime(1019): CheckJNI is ON 02-16 12:28:13.367: DEBUG/AndroidRuntime(1019): --- registering native functions --- 02-16 12:28:15.998: DEBUG/dalvikvm(176): GC_EXPLICIT freed 114 objects / 5888 bytes in 183ms 02-16 12:28:16.539: DEBUG/PackageParser(67): Scanning package: /data/app/vmdl25171.tmp 02-16 12:28:16.867: INFO/PackageManager(67): Removing non-system package:com.example.userpage 02-16 12:28:16.867: INFO/ActivityManager(67): Force stopping package com.example.userpage uid=10036 02-16 12:28:17.277: DEBUG/PackageManager(67): Scanning package com.example.userpage 02-16 12:28:17.308: INFO/PackageManager(67): Package com.example.userpage codePath changed from /data/app/com.example.userpage-2.apk to /data/app/com.example.userpage-1.apk; Retaining data and using new 02-16 12:28:17.328: INFO/PackageManager(67): /data/app/com.example.userpage-1.apk changed; unpacking 02-16 12:28:17.367: DEBUG/installd(35): DexInv: --- BEGIN '/data/app/com.example.userpage-1.apk' --- 02-16 12:28:18.357: DEBUG/dalvikvm(1026): DexOpt: load 85ms, verify 114ms, opt 6ms 02-16 12:28:18.398: DEBUG/installd(35): DexInv: --- END '/data/app/com.example.userpage-1.apk' (success) --- 02-16 12:28:18.428: INFO/ActivityManager(67): Force stopping package com.example.userpage uid=10036 02-16 12:28:18.438: WARN/PackageManager(67): Code path for pkg : com.example.userpage changing from /data/app/com.example.userpage-2.apk to /data/app/com.example.userpage-1.apk 02-16 12:28:18.477: WARN/PackageManager(67): Resource path for pkg : com.example.userpage changing from /data/app/com.example.userpage-2.apk to /data/app/com.example.userpage-1.apk 02-16 12:28:18.477: DEBUG/PackageManager(67): Activities: com.example.userpage.UserPage 02-16 12:28:18.977: INFO/installd(35): move /data/dalvik-cache/data@[email protected]@classes.dex -> /data/dalvik-cache/data@[email protected]@classes.dex 02-16 12:28:18.988: DEBUG/PackageManager(67): New package installed in /data/app/com.example.userpage-1.apk 02-16 12:28:19.528: DEBUG/dalvikvm(67): GC_FOR_MALLOC freed 6733 objects / 459728 bytes in 211ms 02-16 12:28:20.138: INFO/ActivityManager(67): Force stopping package com.example.userpage uid=10036 02-16 12:28:20.368: DEBUG/dalvikvm(129): GC_EXPLICIT freed 892 objects / 48744 bytes in 175ms 02-16 12:28:21.317: WARN/RecognitionManagerService(67): no available voice recognition services found 02-16 12:28:22.827: DEBUG/dalvikvm(67): GC_EXPLICIT freed 3877 objects / 241128 bytes in 452ms 02-16 12:28:22.979: INFO/installd(35): unlink /data/dalvik-cache/data@[email protected]@classes.dex 02-16 12:28:23.277: DEBUG/AndroidRuntime(1019): Shutting down VM 02-16 12:28:23.307: DEBUG/dalvikvm(1019): Debugger has detached; object registry had 1 entries 02-16 12:28:23.328: INFO/AndroidRuntime(1019): NOTE: attach of thread 'Binder Thread #3' failed 02-16 12:28:24.989: DEBUG/AndroidRuntime(1032): >>>>>>>>>>>>>> AndroidRuntime START <<<<<<<<<<<<<< 02-16 12:28:24.989: DEBUG/AndroidRuntime(1032): CheckJNI is ON 02-16 12:28:25.888: DEBUG/AndroidRuntime(1032): --- registering native functions --- 02-16 12:28:28.588: INFO/ActivityManager(67): Starting activity: Intent { act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10000000 cmp=com.example.userpage/.UserPage } 02-16 12:28:28.888: DEBUG/AndroidRuntime(1032): Shutting down VM 02-16 12:28:28.997: DEBUG/dalvikvm(1032): Debugger has detached; object registry had 1 entries 02-16 12:28:29.038: INFO/AndroidRuntime(1032): NOTE: attach of thread 'Binder Thread #3' failed 02-16 12:28:30.417: INFO/ActivityManager(67): Start proc com.example.userpage for activity com.example.userpage/.UserPage: pid=1039 uid=10036 gids={} 02-16 12:28:32.588: DEBUG/AndroidRuntime(1039): Shutting down VM 02-16 12:28:32.598: WARN/dalvikvm(1039): threadid=1: thread exiting with uncaught exception (group=0x4001d800) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): FATAL EXCEPTION: main 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): java.lang.RuntimeException: Unable to start activity ComponentInfo{com.example.userpage/com.example.userpage.UserPage}: java.lang.ClassCastException: android.widget.TextView 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2663) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2679) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): at android.app.ActivityThread.access$2300(ActivityThread.java:125) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2033) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): at android.os.Handler.dispatchMessage(Handler.java:99) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): at android.os.Looper.loop(Looper.java:123) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): at android.app.ActivityThread.main(ActivityThread.java:4627) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): at java.lang.reflect.Method.invokeNative(Native Method) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): at java.lang.reflect.Method.invoke(Method.java:521) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:868) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:626) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): at dalvik.system.NativeStart.main(Native Method) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): Caused by: java.lang.ClassCastException: android.widget.TextView 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): at com.example.userpage.UserPage.onCreate(UserPage.java:34) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1047) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2627) 02-16 12:28:32.648: ERROR/AndroidRuntime(1039): ... 11 more 02-16 12:28:32.698: WARN/ActivityManager(67): Force finishing activity com.example.userpage/.UserPage 02-16 12:28:32.967: DEBUG/dalvikvm(292): GC_EXPLICIT freed 46 objects / 2240 bytes in 6840ms 02-16 12:28:33.247: WARN/ActivityManager(67): Activity pause timeout for HistoryRecord{43ee7b70 com.example.userpage/.UserPage} 02-16 12:28:36.947: INFO/Process(1039): Sending signal. PID: 1039 SIG: 9 02-16 12:28:37.017: INFO/ActivityManager(67): Process com.example.userpage (pid 1039) has died. 02-16 12:28:37.128: WARN/InputManagerService(67): Window already focused, ignoring focus gain of: com.android.internal.view.IInputMethodClient$Stub$Proxy@43e872f8 02-16 12:28:42.087: DEBUG/dalvikvm(176): GC_EXPLICIT freed 156 objects / 11488 bytes in 145ms 02-16 12:28:45.391: WARN/ActivityManager(67): Activity destroy timeout for HistoryRecord{43ee7b70 com.example.userpage/.UserPage} 02-16 12:28:47.177: DEBUG/SntpClient(67): request time failed: java.net.SocketException: Address family not supported by protocol

Read the article
Find Nearest Object

- by ultifinitus

I have a fairly sizable game engine created, and I'm adding some needed features, such as this, how do I find the nearest object from a list of points? In this case, I could simply use the Pythagorean theorem to find the distance, and check the results. I know I can't simply add x and y, because that's the distance to the object, if you only took right angle turns. However I'm wondering if there's something else I could do? I also have a collision system, where essentially I turn objects into smaller objects on a smaller grid, kind of like a minimap, and only if objects exist in the same gridspace do I check for collisions, I could do the same thing, only make the gridspace larger to check for closeness. (rather than checking every. single. object) however that would take additional setup in my base class and clutter up the already cluttered object. TL;DR Question: Is there something efficient and accurate that I can use to detect which object is closest, based on a list of points and sizes?

Read the article
Understanding G1 GC Logs

- by poonam

The purpose of this post is to explain the meaning of GC logs generated with some tracing and diagnostic options for G1 GC. We will take a look at the output generated with PrintGCDetails which is a product flag and provides the most detailed level of information. Along with that, we will also look at the output of two diagnostic flags that get enabled with -XX:+UnlockDiagnosticVMOptions option - G1PrintRegionLivenessInfo that prints the occupancy and the amount of space used by live objects in each region at the end of the marking cycle and G1PrintHeapRegions that provides detailed information on the heap regions being allocated and reclaimed. We will be looking at the logs generated with JDK 1.7.0_04 using these options. Option -XX:+PrintGCDetails Here's a sample log of G1 collection generated with PrintGCDetails. 0.522: [GC pause (young), 0.15877971 secs] [Parallel Time: 157.1 ms] [GC Worker Start (ms): 522.1 522.2 522.2 522.2 Avg: 522.2, Min: 522.1, Max: 522.2, Diff: 0.1] [Ext Root Scanning (ms): 1.6 1.5 1.6 1.9 Avg: 1.7, Min: 1.5, Max: 1.9, Diff: 0.4] [Update RS (ms): 38.7 38.8 50.6 37.3 Avg: 41.3, Min: 37.3, Max: 50.6, Diff: 13.3] [Processed Buffers : 2 2 3 2 Sum: 9, Avg: 2, Min: 2, Max: 3, Diff: 1] [Scan RS (ms): 9.9 9.7 0.0 9.7 Avg: 7.3, Min: 0.0, Max: 9.9, Diff: 9.9] [Object Copy (ms): 106.7 106.8 104.6 107.9 Avg: 106.5, Min: 104.6, Max: 107.9, Diff: 3.3] [Termination (ms): 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] [Termination Attempts : 1 4 4 6 Sum: 15, Avg: 3, Min: 1, Max: 6, Diff: 5] [GC Worker End (ms): 679.1 679.1 679.1 679.1 Avg: 679.1, Min: 679.1, Max: 679.1, Diff: 0.1] [GC Worker (ms): 156.9 157.0 156.9 156.9 Avg: 156.9, Min: 156.9, Max: 157.0, Diff: 0.1] [GC Worker Other (ms): 0.3 0.3 0.3 0.3 Avg: 0.3, Min: 0.3, Max: 0.3, Diff: 0.0] [Clear CT: 0.1 ms] [Other: 1.5 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.3 ms] [Ref Enq: 0.0 ms] [Free CSet: 0.3 ms] [Eden: 12M(12M)->0B(10M) Survivors: 0B->2048K Heap: 13M(64M)->9739K(64M)] [Times: user=0.59 sys=0.02, real=0.16 secs] This is the typical log of an Evacuation Pause (G1 collection) in which live objects are copied from one set of regions (young OR young+old) to another set. It is a stop-the-world activity and all the application threads are stopped at a safepoint during this time. This pause is made up of several sub-tasks indicated by the indentation in the log entries. Here's is the top most line that gets printed for the Evacuation Pause. 0.522: [GC pause (young), 0.15877971 secs] This is the highest level information telling us that it is an Evacuation Pause that started at 0.522 secs from the start of the process, in which all the regions being evacuated are Young i.e. Eden and Survivor regions. This collection took 0.15877971 secs to finish. Evacuation Pauses can be mixed as well. In which case the set of regions selected include all of the young regions as well as some old regions. 1.730: [GC pause (mixed), 0.32714353 secs] Let's take a look at all the sub-tasks performed in this Evacuation Pause. [Parallel Time: 157.1 ms] Parallel Time is the total elapsed time spent by all the parallel GC worker threads. The following lines correspond to the parallel tasks performed by these worker threads in this total parallel time, which in this case is 157.1 ms. [GC Worker Start (ms): 522.1 522.2 522.2 522.2Avg: 522.2, Min: 522.1, Max: 522.2, Diff: 0.1] The first line tells us the start time of each of the worker thread in milliseconds. The start times are ordered with respect to the worker thread ids – thread 0 started at 522.1ms and thread 1 started at 522.2ms from the start of the process. The second line tells the Avg, Min, Max and Diff of the start times of all of the worker threads. [Ext Root Scanning (ms): 1.6 1.5 1.6 1.9 Avg: 1.7, Min: 1.5, Max: 1.9, Diff: 0.4] This gives us the time spent by each worker thread scanning the roots (globals, registers, thread stacks and VM data structures). Here, thread 0 took 1.6ms to perform the root scanning task and thread 1 took 1.5 ms. The second line clearly shows the Avg, Min, Max and Diff of the times spent by all the worker threads. [Update RS (ms): 38.7 38.8 50.6 37.3 Avg: 41.3, Min: 37.3, Max: 50.6, Diff: 13.3] Update RS gives us the time each thread spent in updating the Remembered Sets. Remembered Sets are the data structures that keep track of the references that point into a heap region. Mutator threads keep changing the object graph and thus the references that point into a particular region. We keep track of these changes in buffers called Update Buffers. The Update RS sub-task processes the update buffers that were not able to be processed concurrently, and updates the corresponding remembered sets of all regions. [Processed Buffers : 2 2 3 2Sum: 9, Avg: 2, Min: 2, Max: 3, Diff: 1] This tells us the number of Update Buffers (mentioned above) processed by each worker thread. [Scan RS (ms): 9.9 9.7 0.0 9.7 Avg: 7.3, Min: 0.0, Max: 9.9, Diff: 9.9] These are the times each worker thread had spent in scanning the Remembered Sets. Remembered Set of a region contains cards that correspond to the references pointing into that region. This phase scans those cards looking for the references pointing into all the regions of the collection set. [Object Copy (ms): 106.7 106.8 104.6 107.9 Avg: 106.5, Min: 104.6, Max: 107.9, Diff: 3.3] These are the times spent by each worker thread copying live objects from the regions in the Collection Set to the other regions. [Termination (ms): 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] Termination time is the time spent by the worker thread offering to terminate. But before terminating, it checks the work queues of other threads and if there are still object references in other work queues, it tries to steal object references, and if it succeeds in stealing a reference, it processes that and offers to terminate again. [Termination Attempts : 1 4 4 6 Sum: 15, Avg: 3, Min: 1, Max: 6, Diff: 5] This gives the number of times each thread has offered to terminate. [GC Worker End (ms): 679.1 679.1 679.1 679.1 Avg: 679.1, Min: 679.1, Max: 679.1, Diff: 0.1] These are the times in milliseconds at which each worker thread stopped. [GC Worker (ms): 156.9 157.0 156.9 156.9 Avg: 156.9, Min: 156.9, Max: 157.0, Diff: 0.1] These are the total lifetimes of each worker thread. [GC Worker Other (ms): 0.3 0.3 0.3 0.3Avg: 0.3, Min: 0.3, Max: 0.3, Diff: 0.0] These are the times that each worker thread spent in performing some other tasks that we have not accounted above for the total Parallel Time. [Clear CT: 0.1 ms] This is the time spent in clearing the Card Table. This task is performed in serial mode. [Other: 1.5 ms] Time spent in the some other tasks listed below. The following sub-tasks (which individually may be parallelized) are performed serially. [Choose CSet: 0.0 ms] Time spent in selecting the regions for the Collection Set. [Ref Proc: 0.3 ms] Total time spent in processing Reference objects. [Ref Enq: 0.0 ms] Time spent in enqueuing references to the ReferenceQueues. [Free CSet: 0.3 ms] Time spent in freeing the collection set data structure. [Eden: 12M(12M)->0B(13M) Survivors: 0B->2048K Heap: 14M(64M)->9739K(64M)] This line gives the details on the heap size changes with the Evacuation Pause. This shows that Eden had the occupancy of 12M and its capacity was also 12M before the collection. After the collection, its occupancy got reduced to 0 since everything is evacuated/promoted from Eden during a collection, and its target size grew to 13M. The new Eden capacity of 13M is not reserved at this point. This value is the target size of the Eden. Regions are added to Eden as the demand is made and when the added regions reach to the target size, we start the next collection. Similarly, Survivors had the occupancy of 0 bytes and it grew to 2048K after the collection. The total heap occupancy and capacity was 14M and 64M receptively before the collection and it became 9739K and 64M after the collection. Apart from the evacuation pauses, G1 also performs concurrent-marking to build the live data information of regions. 1.416: [GC pause (young) (initial-mark), 0.62417980 secs] ….... 2.042: [GC concurrent-root-region-scan-start] 2.067: [GC concurrent-root-region-scan-end, 0.0251507] 2.068: [GC concurrent-mark-start] 3.198: [GC concurrent-mark-reset-for-overflow] 4.053: [GC concurrent-mark-end, 1.9849672 sec] 4.055: [GC remark 4.055: [GC ref-proc, 0.0000254 secs], 0.0030184 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 4.088: [GC cleanup 117M->106M(138M), 0.0015198 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 4.090: [GC concurrent-cleanup-start] 4.091: [GC concurrent-cleanup-end, 0.0002721] The first phase of a marking cycle is Initial Marking where all the objects directly reachable from the roots are marked and this phase is piggy-backed on a fully young Evacuation Pause. 2.042: [GC concurrent-root-region-scan-start] This marks the start of a concurrent phase that scans the set of root-regions which are directly reachable from the survivors of the initial marking phase. 2.067: [GC concurrent-root-region-scan-end, 0.0251507] End of the concurrent root region scan phase and it lasted for 0.0251507 seconds. 2.068: [GC concurrent-mark-start] Start of the concurrent marking at 2.068 secs from the start of the process. 3.198: [GC concurrent-mark-reset-for-overflow] This indicates that the global marking stack had became full and there was an overflow of the stack. Concurrent marking detected this overflow and had to reset the data structures to start the marking again. 4.053: [GC concurrent-mark-end, 1.9849672 sec] End of the concurrent marking phase and it lasted for 1.9849672 seconds. 4.055: [GC remark 4.055: [GC ref-proc, 0.0000254 secs], 0.0030184 secs] This corresponds to the remark phase which is a stop-the-world phase. It completes the left over marking work (SATB buffers processing) from the previous phase. In this case, this phase took 0.0030184 secs and out of which 0.0000254 secs were spent on Reference processing. 4.088: [GC cleanup 117M->106M(138M), 0.0015198 secs] Cleanup phase which is again a stop-the-world phase. It goes through the marking information of all the regions, computes the live data information of each region, resets the marking data structures and sorts the regions according to their gc-efficiency. In this example, the total heap size is 138M and after the live data counting it was found that the total live data size dropped down from 117M to 106M. 4.090: [GC concurrent-cleanup-start] This concurrent cleanup phase frees up the regions that were found to be empty (didn't contain any live data) during the previous stop-the-world phase. 4.091: [GC concurrent-cleanup-end, 0.0002721] Concurrent cleanup phase took 0.0002721 secs to free up the empty regions. Option -XX:G1PrintRegionLivenessInfo Now, let's look at the output generated with the flag G1PrintRegionLivenessInfo. This is a diagnostic option and gets enabled with -XX:+UnlockDiagnosticVMOptions. G1PrintRegionLivenessInfo prints the live data information of each region during the Cleanup phase of the concurrent-marking cycle. 26.896: [GC cleanup ### PHASE Post-Marking @ 26.896### HEAP committed: 0x02e00000-0x0fe00000 reserved: 0x02e00000-0x12e00000 region-size: 1048576 Cleanup phase of the concurrent-marking cycle started at 26.896 secs from the start of the process and this live data information is being printed after the marking phase. Committed G1 heap ranges from 0x02e00000 to 0x0fe00000 and the total G1 heap reserved by JVM is from 0x02e00000 to 0x12e00000. Each region in the G1 heap is of size 1048576 bytes. ### type address-range used prev-live next-live gc-eff### (bytes) (bytes) (bytes) (bytes/ms) This is the header of the output that tells us about the type of the region, address-range of the region, used space in the region, live bytes in the region with respect to the previous marking cycle, live bytes in the region with respect to the current marking cycle and the GC efficiency of that region. ### FREE 0x02e00000-0x02f00000 0 0 0 0.0 This is a Free region. ### OLD 0x02f00000-0x03000000 1048576 1038592 1038592 0.0 Old region with address-range from 0x02f00000 to 0x03000000. Total used space in the region is 1048576 bytes, live bytes as per the previous marking cycle are 1038592 and live bytes with respect to the current marking cycle are also 1038592. The GC efficiency has been computed as 0. ### EDEN 0x03400000-0x03500000 20992 20992 20992 0.0 This is an Eden region. ### HUMS 0x0ae00000-0x0af00000 1048576 1048576 1048576 0.0### HUMC 0x0af00000-0x0b000000 1048576 1048576 1048576 0.0### HUMC 0x0b000000-0x0b100000 1048576 1048576 1048576 0.0### HUMC 0x0b100000-0x0b200000 1048576 1048576 1048576 0.0### HUMC 0x0b200000-0x0b300000 1048576 1048576 1048576 0.0### HUMC 0x0b300000-0x0b400000 1048576 1048576 1048576 0.0### HUMC 0x0b400000-0x0b500000 1001480 1001480 1001480 0.0 These are the continuous set of regions called Humongous regions for storing a large object. HUMS (Humongous starts) marks the start of the set of humongous regions and HUMC (Humongous continues) tags the subsequent regions of the humongous regions set. ### SURV 0x09300000-0x09400000 16384 16384 16384 0.0 This is a Survivor region. ### SUMMARY capacity: 208.00 MB used: 150.16 MB / 72.19 % prev-live: 149.78 MB / 72.01 % next-live: 142.82 MB / 68.66 % At the end, a summary is printed listing the capacity, the used space and the change in the liveness after the completion of concurrent marking. In this case, G1 heap capacity is 208MB, total used space is 150.16MB which is 72.19% of the total heap size, live data in the previous marking was 149.78MB which was 72.01% of the total heap size and the live data as per the current marking is 142.82MB which is 68.66% of the total heap size. Option -XX:+G1PrintHeapRegions G1PrintHeapRegions option logs the regions related events when regions are committed, allocated into or are reclaimed. COMMIT/UNCOMMIT events G1HR COMMIT [0x6e900000,0x6ea00000]G1HR COMMIT [0x6ea00000,0x6eb00000] Here, the heap is being initialized or expanded and the region (with bottom: 0x6eb00000 and end: 0x6ec00000) is being freshly committed. COMMIT events are always generated in order i.e. the next COMMIT event will always be for the uncommitted region with the lowest address. G1HR UNCOMMIT [0x72700000,0x72800000]G1HR UNCOMMIT [0x72600000,0x72700000] Opposite to COMMIT. The heap got shrunk at the end of a Full GC and the regions are being uncommitted. Like COMMIT, UNCOMMIT events are also generated in order i.e. the next UNCOMMIT event will always be for the committed region with the highest address. GC Cycle events G1HR #StartGC 7G1HR CSET 0x6e900000G1HR REUSE 0x70500000G1HR ALLOC(Old) 0x6f800000G1HR RETIRE 0x6f800000 0x6f821b20G1HR #EndGC 7 This shows start and end of an Evacuation pause. This event is followed by a GC counter tracking both evacuation pauses and Full GCs. Here, this is the 7th GC since the start of the process. G1HR #StartFullGC 17G1HR UNCOMMIT [0x6ed00000,0x6ee00000]G1HR POST-COMPACTION(Old) 0x6e800000 0x6e854f58G1HR #EndFullGC 17 Shows start and end of a Full GC. This event is also followed by the same GC counter as above. This is the 17th GC since the start of the process. ALLOC events G1HR ALLOC(Eden) 0x6e800000 The region with bottom 0x6e800000 just started being used for allocation. In this case it is an Eden region and allocated into by a mutator thread. G1HR ALLOC(StartsH) 0x6ec00000 0x6ed00000G1HR ALLOC(ContinuesH) 0x6ed00000 0x6e000000 Regions being used for the allocation of Humongous object. The object spans over two regions. G1HR ALLOC(SingleH) 0x6f900000 0x6f9eb010 Single region being used for the allocation of Humongous object. G1HR COMMIT [0x6ee00000,0x6ef00000]G1HR COMMIT [0x6ef00000,0x6f000000]G1HR COMMIT [0x6f000000,0x6f100000]G1HR COMMIT [0x6f100000,0x6f200000]G1HR ALLOC(StartsH) 0x6ee00000 0x6ef00000G1HR ALLOC(ContinuesH) 0x6ef00000 0x6f000000G1HR ALLOC(ContinuesH) 0x6f000000 0x6f100000G1HR ALLOC(ContinuesH) 0x6f100000 0x6f102010 Here, Humongous object allocation request could not be satisfied by the free committed regions that existed in the heap, so the heap needed to be expanded. Thus new regions are committed and then allocated into for the Humongous object. G1HR ALLOC(Old) 0x6f800000 Old region started being used for allocation during GC. G1HR ALLOC(Survivor) 0x6fa00000 Region being used for copying old objects into during a GC. Note that Eden and Humongous ALLOC events are generated outside the GC boundaries and Old and Survivor ALLOC events are generated inside the GC boundaries. Other Events G1HR RETIRE 0x6e800000 0x6e87bd98 Retire and stop using the region having bottom 0x6e800000 and top 0x6e87bd98 for allocation. Note that most regions are full when they are retired and we omit those events to reduce the output volume. A region is retired when another region of the same type is allocated or we reach the start or end of a GC(depending on the region). So for Eden regions: For example: 1. ALLOC(Eden) Foo2. ALLOC(Eden) Bar3. StartGC At point 2, Foo has just been retired and it was full. At point 3, Bar was retired and it was full. If they were not full when they were retired, we will have a RETIRE event: 1. ALLOC(Eden) Foo2. RETIRE Foo top3. ALLOC(Eden) Bar4. StartGC G1HR CSET 0x6e900000 Region (bottom: 0x6e900000) is selected for the Collection Set. The region might have been selected for the collection set earlier (i.e. when it was allocated). However, we generate the CSET events for all regions in the CSet at the start of a GC to make sure there's no confusion about which regions are part of the CSet. G1HR POST-COMPACTION(Old) 0x6e800000 0x6e839858 POST-COMPACTION event is generated for each non-empty region in the heap after a full compaction. A full compaction moves objects around, so we don't know what the resulting shape of the heap is (which regions were written to, which were emptied, etc.). To deal with this, we generate a POST-COMPACTION event for each non-empty region with its type (old/humongous) and the heap boundaries. At this point we should only have Old and Humongous regions, as we have collapsed the young generation, so we should not have eden and survivors. POST-COMPACTION events are generated within the Full GC boundary. G1HR CLEANUP 0x6f400000G1HR CLEANUP 0x6f300000G1HR CLEANUP 0x6f200000 These regions were found empty after remark phase of Concurrent Marking and are reclaimed shortly afterwards. G1HR #StartGC 5G1HR CSET 0x6f400000G1HR CSET 0x6e900000G1HR REUSE 0x6f800000 At the end of a GC we retire the old region we are allocating into. Given that its not full, we will carry on allocating into it during the next GC. This is what REUSE means. In the above case 0x6f800000 should have been the last region with an ALLOC(Old) event during the previous GC and should have been retired before the end of the previous GC. G1HR ALLOC-FORCE(Eden) 0x6f800000 A specialization of ALLOC which indicates that we have reached the max desired number of the particular region type (in this case: Eden), but we decided to allocate one more. Currently it's only used for Eden regions when we extend the young generation because we cannot do a GC as the GC-Locker is active. G1HR EVAC-FAILURE 0x6f800000 During a GC, we have failed to evacuate an object from the given region as the heap is full and there is no space left to copy the object. This event is generated within GC boundaries and exactly once for each region from which we failed to evacuate objects. When Heap Regions are reclaimed ? It is also worth mentioning when the heap regions in the G1 heap are reclaimed. All regions that are in the CSet (the ones that appear in CSET events) are reclaimed at the end of a GC. The exception to that are regions with EVAC-FAILURE events. All regions with CLEANUP events are reclaimed. After a Full GC some regions get reclaimed (the ones from which we moved the objects out). But that is not shown explicitly, instead the non-empty regions that are left in the heap are printed out with the POST-COMPACTION events.

Read the article
Querying Visual Studio project files using T-SQL and Powershell

- by jamiet

Earlier today I had a need to get some information out of a Visual Studio project file and in this blog post I’m going to share a couple of ways of going about that because I’m pretty sure I won’t be the only person that ever wants to do this. The specific problem I was trying to solve was finding out how many objects in my database project (i.e. in my .dbproj file) had any warnings suppressed but the techniques discussed below will work pretty well for any Visual Studio project file because every such file is simply an XML document, hence it can be queried by anything that can query XML documents. Ever heard the phrase “when all you’ve got is hammer everything looks like a nail”? Well that’s me with querying stuff – if I can write SQL then I’m writing SQL. Here’s a little noddy database project I put together for demo purposes: Two views and a stored procedure, nothing fancy. I suppressed warnings for [View1] & [Procedure1] and hence the pertinent part my project file looks like this: <ItemGroup> <Build Include="Schema Objects\Schemas\dbo\Views\View1.view.sql"> <SubType>Code</SubType> <SuppressWarnings>4151,3276</SuppressWarnings> </Build> <Build Include="Schema Objects\Schemas\dbo\Views\View2.view.sql"> <SubType>Code</SubType> </Build> <Build Include="Schema Objects\Schemas\dbo\Programmability\Stored Procedures\Procedure1.proc.sql"> <SubType>Code</SubType> <SuppressWarnings>4151</SuppressWarnings> </Build> </ItemGroup> <ItemGroup> Note the <SuppressWarnings> elements – those are the bits of information that I am after. With a lot of help from folks on the SQL Server XML forum I came up with the following query that nailed what I was after. It reads the contents of the .dbproj file into a variable of type XML and then shreds it using T-SQL’s XML data type methods: DECLARE @xml XML; SELECT @xml = CAST(pkgblob.BulkColumn AS XML) FROM OPENROWSET(BULK 'C:\temp\QueryingProjectFileDemo\QueryingProjectFileDemo.dbproj' -- <-Change this path! ,single_blob) AS pkgblob ;WITH XMLNAMESPACES( 'http://schemas.microsoft.com/developer/msbuild/2003' AS ns) SELECT REVERSE(SUBSTRING(REVERSE(ObjectPath),0,CHARINDEX('\',REVERSE(ObjectPath)))) AS [ObjectName] ,[SuppressedWarnings] FROM ( SELECT build.query('.') AS [_node] , build.value('ns:SuppressWarnings[1]','nvarchar(100)') AS [SuppressedWarnings] , build.value('@Include','nvarchar(1000)') AS [ObjectPath] FROM @xml.nodes('//ns:Build[ns:SuppressWarnings]') AS R(build) )q And here’s the output: And that’s it – an easy way of discovering which warnings have been suppressed and for which objects in your database projects. I won’t bother going over the code as it is fairly self-explanatory – peruse it at your leisure. Once I had the SQL above I figured I’d share it around a little in case it was ever useful to anyone else; hence I’m writing this blog post and I also posted it on the Visual Studio Database Development Tools forum at FYI: Discover which objects have had warnings suppressed. Luckily Kevin Goode saw the thread and he posted a different solution to the same problem, one that uses Powershell. The advantage of Kevin’s Powershell approach is that it is easy to analyse many .dbproj files at the same time. Below is Kevin’s code which I have tweaked ever so slightly so that it produces the same results as my SQL script (I just want any object that had had a warning suppressed whereas Kevin was querying specifically for warning 4151): cd 'C:\Temp\QueryingProjectFileDemo\' cls $projects = ls -r -i *.dbproj Foreach($project in $projects) { $xml = new-object System.Xml.XmlDocument $xml.set_PreserveWhiteSpace( $true ) $xml.Load($project) #$xpath = @{Start="/e:Project/e:ItemGroup/e:Build[e:SuppressWarnings=4151]/@Include"} #$xpath = @{Start="/e:Project/e:ItemGroup/e:Build[contains(e:SuppressWarnings,'4151')]/@Include"} $xpath = @{Start="/e:Project/e:ItemGroup/e:Build[e:SuppressWarnings]/@Include"} $ns = @{ e = "http://schemas.microsoft.com/developer/msbuild/2003" } $xml | Select-Xml -XPath $xpath.Start -Namespace $ns |Select -Expand Node | Select -expand Value } and here’s the output: Nice reusable Powershell and SQL scripts – not bad for an evening’s work. Thank you to Kevin for allowing me to share his code. Don’t forget that these techniques can easily be adapted to query any Visual Studio project file, they’re only XML documents after all! Doubtless many people out there already have code for doing this but nonetheless here is another offering to the great script library in the sky. Have fun! @Jamiet

Read the article
"Optimal" game loop for 2D side-scroller

- by MrDatabase

Is it possible to describe an "optimal" (in terms of performance) layout for a 2D side-scroller's game loop? In this context the "game loop" takes user input, updates the states of game objects and draws the game objects. For example having a GameObject base class with a deep inheritance hierarchy could be good for maintenance... you can do something like the following: foreach(GameObject g in gameObjects) g.update(); However I think this approach can create performance issues. On the other hand all game objects' data and functions could be global. Which would be a maintenance headache but might be closer to an optimally performing game loop. Any thoughts? I'm interested in practical applications of near optimal game loop structure... even if I get a maintenance headache in exchange for great performance.

Read the article
Multiple Object Instantiation

- by Ricky Baby

I am trying to get my head around object oriented programming as it pertains to web development (more specifically PHP). I understand inheritance and abstraction etc, and know all the "buzz-words" like encapsulation and single purpose and why I should be doing all this. But my knowledge is falling short with actually creating objects that relate to the data I have in my database, creating a single object that a representative of a single entity makes sense, but what are the best practises when creating 100, 1,000 or 10,000 objects of the same type. for instance, when trying to display a list of the items, ideally I would like to be consistent with the objects I use, but where exactly should I run the query/get the data to populate the object(s) as running 10,000 queries seems wasteful. As an example, say I have a database of cats, and I want a list of all black cats, do I need to set up a FactoryObject which grabs the data needed for each cat from my database, then passes that data into each individual CatObject and returns the results in a array/object - or should I pass each CatObject it's identifier and let it populate itself in a separate query.

Read the article
C Problem with Compiler?

- by Solomon081

I just started learning C, and wrote my hello world program: #include <stdio.h> main() { printf("Hello World"); return 0; } When I run the code, I get a really long error: Apple Mach-O Linker (id) Error Ld /Users/Solomon/Library/Developer/Xcode/DerivedData/CProj-cwosspupvengheeaapmkrhxbxjvk/Build/Products/Debug/CProj normal x86_64 cd /Users/Solomon/Desktop/C/CProj setenv MACOSX_DEPLOYMENT_TARGET 10.7 /Developer/usr/bin/clang -arch x86_64 -isysroot /Developer/SDKs/MacOSX10.7.sdk -L/Users/Solomon/Library/Developer/Xcode/DerivedData/CProj-cwosspupvengheeaapmkrhxbxjvk/Build/Products/Debug -F/Users/Solomon/Library/Developer/Xcode/DerivedData/CProj-cwosspupvengheeaapmkrhxbxjvk/Build/Products/Debug -filelist /Users/Solomon/Library/Developer/Xcode/DerivedData/CProj-cwosspupvengheeaapmkrhxbxjvk/Build/Intermediates/CProj.build/Debug/CProj.build/Objects-normal/x86_64/CProj.LinkFileList -mmacosx-version-min=10.7 -o /Users/Solomon/Library/Developer/Xcode/DerivedData/CProj-cwosspupvengheeaapmkrhxbxjvk/Build/Products/Debug/CProj ld: duplicate symbol _main in /Users/Solomon/Library/Developer/Xcode/DerivedData/CProj-cwosspupvengheeaapmkrhxbxjvk/Build/Intermediates/CProj.build/Debug/CProj.build/Objects-normal/x86_64/helloworld.o and /Users/Solomon/Library/Developer/Xcode/DerivedData/CProj-cwosspupvengheeaapmkrhxbxjvk/Build/Intermediates/CProj.build/Debug/CProj.build/Objects-normal/x86_64/main.o for architecture x86_64 Command /Developer/usr/bin/clang failed with exit code 1 I am running xCode Should I reinstall DevTools?

Read the article
Read only array, deep copy or retrieve copies one by one? (Performance and Memory)

- by Arthur Wulf White

In a garbage collection based system, what is the most effective way to handle a read only array if such a structure does not exist natively in the language. Is it better to return a copy of an array or allow other classes to retrieve copies of the objects stored in the array one by one? @JustinSkiles: It is not very broad. It is performance related and can actually be answered specifically for two general cases. You only need very few items: in this situation it's more effective to retrieve copies of the objects one by one. You wish to iterate over 30% or more objects. In this cases it is superior to retrieve all the array at once. This saves on functions calls. Function calls are very expansive when compared to reading directly from an array. A good specific answer could include performance, reading from an array and reading indirectly through a function. It is a simple performance related question.

Read the article
Is there really Object-relational impedance mismatch?

- by user52763

It is always stated that it is hard to store applications objects in relational databases - the object-relational impedance mismatch - and that is why Document databases are better. However, is there really an impedance mismatch? And object has a key (albeit it may be hidden away by the runtime as a pointer to memory), a set of values, and foreign keys to other objects. Objects are as much made up of tables as it is a document. Neither really fit. I can see a use for databases to model the data into specific shapes for scenarios in the application - e.g. to speed up database lookup and avoid joins, etc., but won't it be better to keep the data as normalized as possible at the core, and transform as required?

Read the article
How can I create and animate 2D skeletons for HTML5 Javascript games? [on hold]

- by user414209

I'm trying to make a 2D fighting game in HTML5(somewhat like street fighter). So basically there are two players, one AI and one Human. The players need to have animations for the body movements. Also, there needs to be some collision detection system. I'm using createjs for coding but to design models/objects/animations, I need some other software. So I'm looking for a software that can: easily make custom animation of 2d objects. The objects structure(skeleton etc.) will be same once defined but need to be defined once. Can export the animations and models in a js readable format(preferably json) Collision detection can be done easily after the exported format is loaded in a game engine. For point 1, I'm looking for some generic skeleton based animation. Sprite-sheet based animations will be difficult for collision detection.

Read the article
Solving 2D Collision Detection Issues with Relative Velocities

- by Jengerer

Imagine you have a situation where two objects are moving parallel to one-another and are both within range to collide with a static wall, like this: A common method used in dynamic collision detection is to loop through all objects in arbitrary order, solve for pair-wise collision detection using relative velocities, and then move the object to the nearest collision, if any. However, in this case, if the red object is checked first against the blue one, it would see that the relative velocity to the blue object is -20 m/s (and would thereby not collide this time frame). Then it would see that the red object would collide with the static wall, and the solution would be: And the red object passes through the blue one. So it appears to be a matter of choosing the right order in which you check collisions; but how can you determine which order is correct? How can this passing through of objects be avoided? Is ignoring relative velocity and considering every object as static during pair-wise checks a better idea for this reason?

Read the article
XNA Transparency depending on drawing order?

- by DarthRoman

I am drawing two 3D objects, both of them can fade from opaque to transparent independently, and they can intersect between them (so you cannot say when one of them is before the other one). Look at the image for a better understanding (one of the object is a terrain and the other one an area): Now, if I apply transparency to both of them, and draw the terrain before the area, the terrain is not transparent respecting to the area, but the area is: And finally, if I draw the area before the terrain, then the area is not transparent respecting of the terrain: QUESTION: How can I make all the objects transparent to the rest of objects without depending on the drawing order?

Read the article
Motivation and use of move constructors in C++

- by Giorgio

I recently have been reading about move constructors in C++ (see e.g. here) and I am trying to understand how they work and when I should use them. As far as I understand, a move constructor is used to alleviate the performance problems caused by copying large objects. The wikipedia page says: "A chronic performance problem with C++03 is the costly and unnecessary deep copies that can happen implicitly when objects are passed by value." I normally address such situations by passing the objects by reference, or by using smart pointers (e.g. boost::shared_ptr) to pass around the object (the smart pointers get copied instead of the object). What are the situations in which the above two techniques are not sufficient and using a move constructor is more convenient?

Read the article
What is a correct step by step logic of exporting scene with baked occlusion for loading it at runtime?

- by myWallJSON

I wonder what is a correct step by step logic of exporting scene with baked occlusion (Culling data) for loading that scene at runtime (on fly from the internet for example))? So currently my plan looks like this: I create prefabs Place them onto my scene (into Hierarchy) (say create 20 buffolows and some hourses and some buildings) Create empty prefab and drag all my scene objects from hierarchy onto it Export prefab So generally I put all my scene objects into one large prefab and export it but it seems that all objects that were marked as static get this property turned off when loading them at runtime and so no Frustrum Culling, and no Occlusion culling happens. So I wonder what is a correct way of exporting Sceen + Objecrts + Occlusion (and onther culing) data for future load of such scene at runtime? I wonder about current 3.5.2 Pro and future 4 Pro versions of U3D.

Read the article
2D vector graphic html5 framework

- by Yury

I trying to find html5 game framework by following criteria: 1)Real good performance. 2)Good support of vector graphic( objects which contains canvas elements -line, rec,bezierCurve etc.) 3)Easy port to mobile. Optional- Physics Engine. I found 1)Pixi.js- it looks like real good, but i didn't find any info about "vector objects" support. 2) i found "vector objects" support in paper.js I need something like these: http://paperjs.org/examples/chain/ and http://paperjs.org/examples/path-intersections/. But it looks like paper.js- not so good performance as pixi.js. And it is not game engine. Is there any good framework meets these requirements? P.S. I found similar question here Which free HTML5-based game engine meets these requirements?. But it was a long time ago. A lot of new things were created since 2011.

Read the article
ADF Business Components

- by Arda Eralp

ADF Business Components and JDeveloper simplify the development, delivery, and customization of business applications for the Java EE platform. With ADF Business Components, developers aren't required to write the application infrastructure code required by the typical Java EE application to: Connect to the database Retrieve data Lock database records Manage transactions ADF Business Components addresses these tasks through its library of reusable software components and through the supporting design time facilities in JDeveloper. Most importantly, developers save time using ADF Business Components since the JDeveloper design time makes typical development tasks entirely declarative. In particular, JDeveloper supports declarative development with ADF Business Components to: Author and test business logic in components which automatically integrate with databases Reuse business logic through multiple SQL-based views of data, supporting different application tasks Access and update the views from browser, desktop, mobile, and web service clients Customize application functionality in layers without requiring modification of the delivered application The goal of ADF Business Components is to make the business services developer more productive. ADF Business Components provides a foundation of Java classes that allow your business-tier application components to leverage the functionality provided in the following areas: Simplifying Data Access Design a data model for client displays, including only necessary data Include master-detail hierarchies of any complexity as part of the data model Implement end-user Query-by-Example data filtering without code Automatically coordinate data model changes with business services layer Automatically validate and save any changes to the database Enforcing Business Domain Validation and Business Logic Declaratively enforce required fields, primary key uniqueness, data precision-scale, and foreign key references Easily capture and enforce both simple and complex business rules, programmatically or declaratively, with multilevel validation support Navigate relationships between business domain objects and enforce constraints related to compound components Supporting Sophisticated UIs with Multipage Units of Work Automatically reflect changes made by business service application logic in the user interface Retrieve reference information from related tables, and automatically maintain the information when the user changes foreign-key values Simplify multistep web-based business transactions with automatic web-tier state management Handle images, video, sound, and documents without having to use code Synchronize pending data changes across multiple views of data Consistently apply prompts, tooltips, format masks, and error messages in any application Define custom metadata for any business components to support metadata-driven user interface or application functionality Add dynamic attributes at runtime to simplify per-row state management Implementing High-Performance Service-Oriented Architecture Support highly functional web service interfaces for business integration without writing code Enforce best-practice interface-based programming style Simplify application security with automatic JAAS integration and audit maintenance "Write once, run anywhere": use the same business service as plain Java class, EJB session bean, or web service Streamlining Application Customization Extend component functionality after delivery without modifying source code Globally substitute delivered components with extended ones without modifying the application ADF Business Components implements the business service through the following set of cooperating components: Entity object An entity object represents a row in a database table and simplifies modifying its data by handling all data manipulation language (DML) operations for you. These are basically your 1 to 1 representation of a database table. Each table in the database will have 1 and only 1 EO. The EO contains the mapping between columns and attributes. EO's also contain the business logic and validation. These are you core data services. They are responsible for updating, inserting and deleting records. The Attributes tab displays the actual mapping between attributes and columns, the mapping has following fields: Name : contains the name of the attribute we expose in our data model. Type : defines the data type of the attribute in our application. Column : specifies the column to which we want to map the attribute with Column Type : contains the type of the column in the database View object A view object represents a SQL query. You use the full power of the familiar SQL language to join, filter, sort, and aggregate data into exactly the shape required by the end-user task. The attributes in the View Objects are actually coming from the Entity Object. In the end the VO will generate a query but you basically build a VO by selecting which EO need to participate in the VO and which attributes of those EO you want to use. That's why you have the Entity Usage column so you can see the relation between VO and EO. In the query tab you can clearly see the query that will be generated for the VO. At this stage we don't need it and just use it for information purpose. In later stages we might use it. Application module An application module is the controller of your data layer. It is responsible for keeping hold of the transaction. It exposes the data model to the view layer. You expose the VO's through the Application Module. This is the abstraction of your data layer which you want to show to the outside word.It defines an updatable data model and top-level procedures and functions (called service methods) related to a logical unit of work related to an end-user task. While the base components handle all the common cases through built-in behavior, customization is always possible and the default behavior provided by the base components can be easily overridden or augmented. When you create EO's, a foreign key will be translated into an association in our model. It defines the type of relation and who is the master and child as well as how the visibility of the association looks like. A similar concept exists to identify relations between view objects. These are called view links. These are almost identical as association except that a view link is based upon attributes defined in the view object. It can also be based upon an association. Here's a short summary: Entity Objects: representations of tables Association: Relations between EO's. Representations of foreign keys View Objects: Logical model View Links: Relationships between view objects Application Model: interface to your application

Read the article
QuadTree: store only points, or regions?

- by alekop

I am developing a quadtree to keep track of moving objects for collision detection. Each object has a bounding shape, let's say they are all circles. (It's a 2D top-down game) I am unsure whether to store only the position of each object, or the whole bounding shape. If working with points, insertion and subdivision is easy, because objects will never span multiple nodes. On the other hand, a proximity query for an object may miss collisions, because it won't take the objects' dimensions into account. How to calculate the query region when you only have points? If working with regions, how to handle an object that spans multiple nodes? Should it be inserted in the nearest parent node that completely contains it, even if this exceeds the node's capacity? Thanks.

Read the article
Passing data between engine layers

- by spaceOwl

I am building a software system (game engine with networking support ) that is made up of (roughly) these layers: Game Layer Messaging Layer Networking Layer Game related data is passed to the messaging layer (this could be anything that is game specific), where they are to be converted to network specific messages (which are then serialized to byte arrays). I'm looking for a way to be able to convert "game" data into "network" data, such that no strong coupling between these layers will exist. As it looks now, the Messaging layer sits between both layers (game and network) and "knows" both of them (it contains Converter objects that know how to translate between data objects of both layers back and forth). I am not sure this is the best solution. Is there a good design for passing objects between layers? I'd like to learn more about the different options.

Read the article
Find Nearest Object

- by ultifinitus

I have a fairly sizable game engine created, and I'm adding some needed features, such as this, how do I find the nearest object from a list of points? In this case, I could simply use the Pythagorean theorem to find the distance, and check the results. I know I can't simply add x and y, because that's the distance to the object, if you only took right angle turns. However I'm wondering if there's something else I could do? I also have a collision system, where essentially I turn objects into smaller objects on a smaller grid, kind of like a minimap, and only if objects exist in the same gridspace do I check for collisions, I could do the same thing, only make the gridspace larger to check for closeness. (rather than checking every. single. object) however that would take additional setup in my base class and clutter up the already cluttered object. TL;DR Question: Is there something efficient and accurate that I can use to detect which object is closest, based on a list of points and sizes?

Read the article
Optimizing hierarchical transform

- by Geotarget

I'm transforming objects in 3D space by transforming each vector with the object's 4x4 transform matrix. In order to achieve hierarchical transform, I transform the child by its own matrix, and then the child by the parent matrix. This becomes costly because objects deeper in the display tree have to be transformed by all the parent objects. This is what's happening, in summary: Root -- transform its verts by Root matrix Parent -- transform its verts by Parent, Root matrix Child -- transform its verts by Child, Parent, Root matrix Is there a faster way to transform vertices to achieve hierarchical transform? What If I first concatenated each transform matrix with the parent matrices, and then transform verts by that final resulting matrix, would that work and wouldn't that be faster? Root -- transform its verts by Root matrix Parent -- concat Parent, Root matrices, transform its verts by Concated matrix Child -- concat Child, Parent, Root matrices, transform its verts by Concated matrix

Read the article

Search Results

Search found 13829 results on 554 pages for 'temporary objects'.

Page 115/554 | < Previous Page | 111 112 113 114 115 116 117 118 119 120 121 122 | Next Page >

- by Paul White

- by Chris Lambrou

- by Jeroen Vannevel

- by user560084

- by rchrd

- by Piyush

- by ultifinitus

- by poonam

- by jamiet

- by MrDatabase

- by Ricky Baby

- by Solomon081

- by Arthur Wulf White

- by user52763

- by user414209

- by Jengerer

- by DarthRoman

- by Giorgio

- by myWallJSON

- by Yury

- by Arda Eralp

- by alekop

- by spaceOwl

- by ultifinitus

- by Geotarget

< Previous Page | 111 112 113 114 115 116 117 118 119 120 121 122 | Next Page >