Search Results

Search found 587 results on 24 pages for 'seek'.

Page 24/24 | < Previous Page | 20 21 22 23 24 

  • Slow draw on some apps and dynamic clocks not working properly with ATI/AMD proprietary drivers

    - by Rakeka
    I've recently purchased a new computer (around July 2010) and I've been having some problems with proprietary video drivers on Linux. The hardware is: Video: ATI/AMD Radeon HD 5870 (XFX HD-587X-ZNFC); Motherboard: Asus P7P55D-E Deluxe; Processor: Intel i5 750; Memory: Kingston Hyperx KHX1600C8D3K2/4GX (2x - 8GB Total); Power Supply: XFX P1-750B-CAG9; There are no overclocks, not even the memories (they are at 1333mhz due processor memory controller limitation). The operational system is a homebrew Linux distribution with the following software: Architecture: x86_64 (multilib) Kernel: 2.6.35.10 Xorg: 7.5 Window Manager: wmii-3.9.2 Video Driver: ATI/AMD Catalyst 10.12 There are no desktop effects programs like compiz fusion or beryl. The problems: With ATI/AMD proprietary driver, some applications are with slow draw/redraw, and, the same applications make the driver to increase the card clocks to maximum (0% gpu activity, only the clocks are increased). I dunno exactly how to describe the slow draw but I'll list some applications and symptoms. xterm Flickers a lot when drawing continuous output; When I'm in a workspace with fullscreen xterm, The gpu load stays at 12% in idle, and, with smaller xterm, smaller GPU load. "aticonfig --odgc" output: Default Adapter - ATI Radeon HD 5800 Series Core (MHz) Memory (MHz) Current Clocks : 157 300 Current Peak : 850 1200 Configurable Peak Range : [600-900] [900-1300] GPU load : 12% "aticonfig --pplib-cmd 'get activity'" output: Current Activity is Core Clock: 157MHZ Memory Clock: 300MHZ VDDC: 950 Activity: 12 percent Performance Level: 0 Bus Speed: 5000 Bus Lanes: 16 Maximum Bus Lanes: 16 More examples: mplayer time info flickers on terminal; "find /" flickers a lot (It takes some time to stop with control-c. But, If I change the workspace or put some window upon it, just after the control-c, it stops instantly); "cat somefile" if the file is big (Xorg.0.log for example) it takes some time to display; vim and less (ex: find / | less) don't have much problems, just a little flicker when scrolling; mplayer (no gui) Slow reproduction and seek with -vo x11; Tearing with -vo xv; Time info flickers on terminal (xterm consequence); gvim A little slow draw when scrolling with page up/page down; Firefox Slow draw/redraw on some pages like www.boadica.com.br and sometimes on www.youtube.com with flash enable (never noticed on many pages); Corruptions when informative yellow boxes are showing and scroll the page (an gray box appears at the same place of the informative box); "Wallpaper" After minimizing a fullscreen window or changing to an empty workspace it takes some time to redraw wallpaper. "Video Card" The core and memory clocks are increased with the events described above and on other situations like change workspace (even without wallpaper), minimize, maximize or move a window; Idle clocks: Core: 157mhz, Memory: 300mhz Full clocks: Core: 850mhz, Memory: 1200mhz xpdf Painful slow scrolling; display (from ImageMagick) Slow menus and sometimes slow image redraw; Programs that I use and are apparently without problems: gimp; pidgin; mplayer (-vo gl, gl2); blender; unigine heaven (better fps than on Windows); doom3; tibia; penumbra overture; amnesia the dark descent (wine); diablo 2 (wine); No problems on Windows (Windows 7 Ultimate 64bit). And special note to this: Full desktop effects from Debian and Ubuntu gnome appearance cpanel don't cause ANY problems, even the core and memory clocks don't increase when change workspace, minimize, maximize or move a window. What I've tested: Unsuccessful tests: Tested all drivers versions since 10.6 (released approximately when I've installed the first slackware in this PC); Tested other video card - ATI/AMD Radeon HD 5570 (XFX HD-557X-ZHF2); Tested some options on xorg.conf and that I've found googling (some of these options are commented on my xorg.conf. I'll send the links at the end of post); Tested some patches like 107_fedora_dont_fill_bg_none.patch and xserver-xorg-backclear.patch from Arch Linux Catalyst page (https://wiki.archlinux.org/index.php/ATI_Catalyst); Tested other distros and software versions: Tested XORG-7.6 on my own distribution; Tested Debian Squeeze (testing - from 2010-12-20); Tested Ubuntu Marverick (10.10); Tested Slackware 13.1; Distros info: Architecture: i386 Debian and Ubuntu with all default software (kernel, gnome, xorg, drivers); Slackware with Catalyst from AMD page and default window managers like: fvwm, xfce, and my own build of wmii; Successful tests: Tested other video card (only on my homebrew distro) - NVIDIA Geforce 7300GS with driver 260.19.29; That didn't shown the slow draw problems, but that card is a bit obsolete, so, dunno if that lacks features like the dynamic clocks. I don't dispose of other video cards like nvidia g/gt/gts/gtx 200~400~500 or Radeon HD 3000/4000/6000 to make more tests. Tested other hardware: Video: ATI/AMD Radeon HD 5570 (XFX HD-557X-ZHF2); Motherboard: Intel DG31PR; Processor: Core 2 Duo E6750; Software for that hardware: Fresh install of same distros (except for the mine) with same program versions; That video card (HD 5570) were full time at the maximum clocks (something like 500/750, don't remember) in all the operational systems (Windows XP and Windows 7 too), but it didn't shown the same problems that I have here. I've googled a lot about common problems with ATI/AMD proprietary drivers for Linux and didn't find similar problems, except by the Firefox corruptions, that the solutions were to disable ATI Direct2DAccel and use XAA. With XAA the problems persists and the other applications like pidgin and rest of Firefox showed the same problems of slow draw/redraw. Open source Drivers: With open source drivers (xf86-video-ati-6.13.2) I hadn't the same slow draw problems, but, had other problems, that, for now, make it no viable solution. I'll not discuss it here because this is another line of problems and will confuse everything. If it happens to be the only solution, I'll make another thread to discuss it. Logs and Configs: kernel .config dmesg xorg package list xorg.conf Xorg.0.log

    Read the article

  • Improving Partitioned Table Join Performance

    - by Paul White
    The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) );   CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY);   WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50);   -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE);   -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory:   Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   CREATE TABLE #P ( partition_number integer PRIMARY KEY);   INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1;   SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals;   DROP TABLE #P;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated  – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

    Read the article

  • Quick guide to Oracle IRM 11g: Classification design

    - by Simon Thorpe
    Quick guide to Oracle IRM 11g indexThis is the final article in the quick guide to Oracle IRM. If you've followed everything prior you will now have a fully functional and tested Information Rights Management service. It doesn't matter if you've been following the 10g or 11g guide as this next article is common to both. ContentsWhy this is the most important part... Understanding the classification and standard rights model Identifying business use cases Creating an effective IRM classification modelOne single classification across the entire businessA context for each and every possible granular use caseWhat makes a good context? Deciding on the use of roles in the context Reviewing the features and security for context roles Summary Why this is the most important part...Now the real work begins, installing and getting an IRM system running is as simple as following instructions. However to actually have an IRM technology easily protecting your most sensitive information without interfering with your users existing daily work flows and be able to scale IRM across the entire business, requires thought into how confidential documents are created, used and distributed. This article is going to give you the information you need to ask the business the right questions so that you can deploy your IRM service successfully. The IRM team here at Oracle have over 10 years of experience in helping customers and it is important you understand the following to be successful in securing access to your most confidential information. Whatever you are trying to secure, be it mergers and acquisitions information, engineering intellectual property, health care documentation or financial reports. No matter what type of user is going to access the information, be they employees, contractors or customers, there are common goals you are always trying to achieve.Securing the content at the earliest point possible and do it automatically. Removing the dependency on the user to decide to secure the content reduces the risk of mistakes significantly and therefore results a more secure deployment. K.I.S.S. (Keep It Simple Stupid) Reduce complexity in the rights/classification model. Oracle IRM lets you make changes to access to documents even after they are secured which allows you to start with a simple model and then introduce complexity once you've understood how the technology is going to be used in the business. After an initial learning period you can review your implementation and start to make informed decisions based on user feedback and administration experience. Clearly communicate to the user, when appropriate, any changes to their existing work practice. You must make every effort to make the transition to sealed content as simple as possible. For external users you must help them understand why you are securing the documents and inform them the value of the technology to both your business and them. Before getting into the detail, I must pay homage to Martin White, Vice President of client services in SealedMedia, the company Oracle acquired and who created Oracle IRM. In the SealedMedia years Martin was involved with every single customer and was key to the design of certain aspects of the IRM technology, specifically the context model we will be discussing here. Listening carefully to customers and understanding the flexibility of the IRM technology, Martin taught me all the skills of helping customers build scalable, effective and simple to use IRM deployments. No matter how well the engineering department designed the software, badly designed and poorly executed projects can result in difficult to use and manage, and ultimately insecure solutions. The advice and information that follows was born with Martin and he's still delivering IRM consulting with customers and can be found at www.thinkers.co.uk. It is from Martin and others that Oracle not only has the most advanced, scalable and usable document security solution on the market, but Oracle and their partners have the most experience in delivering successful document security solutions. Understanding the classification and standard rights model The goal of any successful IRM deployment is to balance the increase in security the technology brings without over complicating the way people use secured content and avoid a significant increase in administration and maintenance. With Oracle it is possible to automate the protection of content, deploy the desktop software transparently and use authentication methods such that users can open newly secured content initially unaware the document is any different to an insecure one. That is until of course they attempt to do something for which they don't have any rights, such as copy and paste to an insecure application or try and print. Central to achieving this objective is creating a classification model that is simple to understand and use but also provides the right level of complexity to meet the business needs. In Oracle IRM the term used for each classification is a "context". A context defines the relationship between.A group of related documents The people that use the documents The roles that these people perform The rights that these people need to perform their role The context is the key to the success of Oracle IRM. It provides the separation of the role and rights of a user from the content itself. Documents are sealed to contexts but none of the rights, user or group information is stored within the content itself. Sealing only places information about the location of the IRM server that sealed it, the context applied to the document and a few other pieces of metadata that pertain only to the document. This important separation of rights from content means that millions of documents can be secured against a single classification and a user needs only one right assigned to be able to access all documents. If you have followed all the previous articles in this guide, you will be ready to start defining contexts to which your sensitive information will be protected. But before you even start with IRM, you need to understand how your own business uses and creates sensitive documents and emails. Identifying business use cases Oracle is able to support multiple classification systems, but usually there is one single initial need for the technology which drives a deployment. This need might be to protect sensitive mergers and acquisitions information, engineering intellectual property, financial documents. For this and every subsequent use case you must understand how users create and work with documents, to who they are distributed and how the recipients should interact with them. A successful IRM deployment should start with one well identified use case (we go through some examples towards the end of this article) and then after letting this use case play out in the business, you learn how your users work with content, how well your communication to the business worked and if the classification system you deployed delivered the right balance. It is at this point you can start rolling the technology out further. Creating an effective IRM classification model Once you have selected the initial use case you will address with IRM, you need to design a classification model that defines the access to secured documents within the use case. In Oracle IRM there is an inbuilt classification system called the "context" model. In Oracle IRM 11g it is possible to extend the server to support any rights classification model, but the majority of users who are not using an application integration (such as Oracle IRM within Oracle Beehive) are likely to be starting out with the built in context model. Before looking at creating a classification system with IRM, it is worth reviewing some recognized standards and methods for creating and implementing security policy. A very useful set of documents are the ISO 17799 guidelines and the SANS security policy templates. First task is to create a context against which documents are to be secured. A context consists of a group of related documents (all top secret engineering research), a list of roles (contributors and readers) which define how users can access documents and a list of users (research engineers) who have been given a role allowing them to interact with sealed content. Before even creating the first context it is wise to decide on a philosophy which will dictate the level of granularity, the question is, where do you start? At a department level? By project? By technology? First consider the two ends of the spectrum... One single classification across the entire business Imagine that instead of having separate contexts, one for engineering intellectual property, one for your financial data, one for human resources personally identifiable information, you create one context for all documents across the entire business. Whilst you may have immediate objections, there are some significant benefits in thinking about considering this. Document security classification decisions are simple. You only have one context to chose from! User provisioning is simple, just make sure everyone has a role in the only context in the business. Administration is very low, if you assign rights to groups from the business user repository you probably never have to touch IRM administration again. There are however some obvious downsides to this model.All users in have access to all IRM secured content. So potentially a sales person could access sensitive mergers and acquisition documents, if they can get their hands on a copy that is. You cannot delegate control of different documents to different parts of the business, this may not satisfy your regulatory requirements for the separation and delegation of duties. Changing a users role affects every single document ever secured. Even though it is very unlikely a business would ever use one single context to secure all their sensitive information, thinking about this scenario raises one very important point. Just having one single context and securing all confidential documents to it, whilst incurring some of the problems detailed above, has one huge value. Once secured, IRM protected content can ONLY be accessed by authorized users. Just think of all the sensitive documents in your business today, imagine if you could ensure that only everyone you trust could open them. Even if an employee lost a laptop or someone accidentally sent an email to the wrong recipient, only the right people could open that file. A context for each and every possible granular use case Now let's think about the total opposite of a single context design. What if you created a context for each and every single defined business need and created multiple contexts within this for each level of granularity? Let's take a use case where we need to protect engineering intellectual property. Imagine we have 6 different engineering groups, and in each we have a research department, a design department and manufacturing. The company information security policy defines 3 levels of information sensitivity... restricted, confidential and top secret. Then let's say that each group and department needs to define access to information from both internal and external users. Finally add into the mix that they want to review the rights model for each context every financial quarter. This would result in a huge amount of contexts. For example, lets just look at the resulting contexts for one engineering group. Q1FY2010 Restricted Internal - Engineering Group 1 - Research Q1FY2010 Restricted Internal - Engineering Group 1 - Design Q1FY2010 Restricted Internal - Engineering Group 1 - Manufacturing Q1FY2010 Restricted External- Engineering Group 1 - Research Q1FY2010 Restricted External - Engineering Group 1 - Design Q1FY2010 Restricted External - Engineering Group 1 - Manufacturing Q1FY2010 Confidential Internal - Engineering Group 1 - Research Q1FY2010 Confidential Internal - Engineering Group 1 - Design Q1FY2010 Confidential Internal - Engineering Group 1 - Manufacturing Q1FY2010 Confidential External - Engineering Group 1 - Research Q1FY2010 Confidential External - Engineering Group 1 - Design Q1FY2010 Confidential External - Engineering Group 1 - Manufacturing Q1FY2010 Top Secret Internal - Engineering Group 1 - Research Q1FY2010 Top Secret Internal - Engineering Group 1 - Design Q1FY2010 Top Secret Internal - Engineering Group 1 - Manufacturing Q1FY2010 Top Secret External - Engineering Group 1 - Research Q1FY2010 Top Secret External - Engineering Group 1 - Design Q1FY2010 Top Secret External - Engineering Group 1 - Manufacturing Now multiply the above by 6 for each engineering group, 18 contexts. You are then creating/reviewing another 18 every 3 months. After a year you've got 72 contexts. What would be the advantages of such a complex classification model? You can satisfy very granular rights requirements, for example only an authorized engineering group 1 researcher can create a top secret report for access internally, and his role will be reviewed on a very frequent basis. Your business may have very complex rights requirements and mapping this directly to IRM may be an obvious exercise. The disadvantages of such a classification model are significant...Huge administrative overhead. Someone in the business must manage, review and administrate each of these contexts. If the engineering group had a single administrator, they would have 72 classifications to reside over each year. From an end users perspective life will be very confusing. Imagine if a user has rights in just 6 of these contexts. They may be able to print content from one but not another, be able to edit content in 2 contexts but not the other 4. Such confusion at the end user level causes frustration and resistance to the use of the technology. Increased synchronization complexity. Imagine a user who after 3 years in the company ends up with over 300 rights in many different contexts across the business. This would result in long synchronization times as the client software updates all your offline rights. Hard to understand who can do what with what. Imagine being the VP of engineering and as part of an internal security audit you are asked the question, "What rights to researchers have to our top secret information?". In this complex model the answer is not simple, it would depend on many roles in many contexts. Of course this example is extreme, but it highlights that trying to build many barriers in your business can result in a nightmare of administration and confusion amongst users. In the real world what we need is a balance of the two. We need to seek an optimum number of contexts. Too many contexts are unmanageable and too few contexts does not give fine enough granularity. What makes a good context? Good context design derives mainly from how well you understand your business requirements to secure access to confidential information. Some customers I have worked with can tell me exactly the documents they wish to secure and know exactly who should be opening them. However there are some customers who know only of the government regulation that requires them to control access to certain types of information, they don't actually know where the documents are, how they are created or understand exactly who should have access. Therefore you need to know how to ask the business the right questions that lead to information which help you define a context. First ask these questions about a set of documentsWhat is the topic? Who are legitimate contributors on this topic? Who are the authorized readership? If the answer to any one of these is significantly different, then it probably merits a separate context. Remember that sealed documents are inherently secure and as such they cannot leak to your competitors, therefore it is better sealed to a broad context than not sealed at all. Simplicity is key here. Always revert to the first extreme example of a single classification, then work towards essential complexity. If there is any doubt, always prefer fewer contexts. Remember, Oracle IRM allows you to change your mind later on. You can implement a design now and continue to change and refine as you learn how the technology is used. It is easy to go from a simple model to a more complex one, it is much harder to take a complex model that is already embedded in the work practice of users and try to simplify it. It is also wise to take a single use case and address this first with the business. Don't try and tackle many different problems from the outset. Do one, learn from the process, refine it and then take what you have learned into the next use case, refine and continue. Once you have a good grasp of the technology and understand how your business will use it, you can then start rolling out the technology wider across the business. Deciding on the use of roles in the context Once you have decided on that first initial use case and a context to create let's look at the details you need to decide upon. For each context, identify; Administrative rolesBusiness owner, the person who makes decisions about who may or may not see content in this context. This is often the person who wanted to use IRM and drove the business purchase. They are the usually the person with the most at risk when sensitive information is lost. Point of contact, the person who will handle requests for access to content. Sometimes the same as the business owner, sometimes a trusted secretary or administrator. Context administrator, the person who will enact the decisions of the Business Owner. Sometimes the point of contact, sometimes a trusted IT person. Document related rolesContributors, the people who create and edit documents in this context. Reviewers, the people who are involved in reviewing documents but are not trusted to secure information to this classification. This role is not always necessary. (See later discussion on Published-work and Work-in-Progress) Readers, the people who read documents from this context. Some people may have several of the roles above, which is fine. What you are trying to do is understand and define how the business interacts with your sensitive information. These roles obviously map directly to roles available in Oracle IRM. Reviewing the features and security for context roles At this point we have decided on a classification of information, understand what roles people in the business will play when administrating this classification and how they will interact with content. The final piece of the puzzle in getting the information for our first context is to look at the permissions people will have to sealed documents. First think why are you protecting the documents in the first place? It is to prevent the loss of leaking of information to the wrong people. To control the information, making sure that people only access the latest versions of documents. You are not using Oracle IRM to prevent unauthorized people from doing legitimate work. This is an important point, with IRM you can erect many barriers to prevent access to content yet too many restrictions and authorized users will often find ways to circumvent using the technology and end up distributing unprotected originals. Because IRM is a security technology, it is easy to get carried away restricting different groups. However I would highly recommend starting with a simple solution with few restrictions. Ensure that everyone who reasonably needs to read documents can do so from the outset. Remember that with Oracle IRM you can change rights to content whenever you wish and tighten security. Always return to the fact that the greatest value IRM brings is that ONLY authorized users can access secured content, remember that simple "one context for the entire business" model. At the start of the deployment you really need to aim for user acceptance and therefore a simple model is more likely to succeed. As time passes and users understand how IRM works you can start to introduce more restrictions and complexity. Another key aspect to focus on is handling exceptions. If you decide on a context model where engineering can only access engineering information, and sales can only access sales data. Act quickly when a sales manager needs legitimate access to a set of engineering documents. Having a quick and effective process for permitting other people with legitimate needs to obtain appropriate access will be rewarded with acceptance from the user community. These use cases can often be satisfied by integrating IRM with a good Identity & Access Management technology which simplifies the process of assigning users the correct business roles. The big print issue... Printing is often an issue of contention, users love to print but the business wants to ensure sensitive information remains in the controlled digital world. There are many cases of physical document loss causing a business pain, it is often overlooked that IRM can help with this issue by limiting the ability to generate physical copies of digital content. However it can be hard to maintain a balance between security and usability when it comes to printing. Consider the following points when deciding about whether to give print rights. Oracle IRM sealed documents can contain watermarks that expose information about the user, time and location of access and the classification of the document. This information would reside in the printed copy making it easier to trace who printed it. Printed documents are slower to distribute in comparison to their digital counterparts, so time sensitive information in printed format may present a lower risk. Print activity is audited, therefore you can monitor and react to users abusing print rights. Summary In summary it is important to think carefully about the way you create your context model. As you ask the business these questions you may get a variety of different requirements. There may be special projects that require a context just for sensitive information created during the lifetime of the project. There may be a department that requires all information in the group is secured and you might have a few senior executives who wish to use IRM to exchange a small number of highly sensitive documents with a very small number of people. Oracle IRM, with its very flexible context classification system, can support all of these use cases. The trick is to introducing the complexity to deliver them at the right level. In another article i'm working on I will go through some examples of how Oracle IRM might map to existing business use cases. But for now, this article covers all the important questions you need to get your IRM service deployed and successfully protecting your most sensitive information.

    Read the article

  • Dataset.ReadXml() - Invalid character in the given encoding

    - by NLV
    Hello all I have saved a dataset in the sql database in an xml column using the following code. XmlDataDocument dd = new XmlDataDocument(dataset); and passing this xml document as sql parameter using param.value = new XmlNodeReader(dd); The XML is like <NewDataSet><SubContractChangeOrders><AGCol>1</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>006</Contract_x0020_Number><ContractID>30</ContractID><ChangeOrderID>211</ChangeOrderID><Amount>0.0000</Amount><Udf_CostReimbursableFlag>false</Udf_CostReimbursableFlag><Udf_CustomerCode /><Udf_SubChangeOrderStatus /></SubContractChangeOrders><SubContractChangeOrders><AGCol>2</AGCol><SCO_x0020_Number>002</SCO_x0020_Number><Contract_x0020_Number>006</Contract_x0020_Number><ContractID>30</ContractID><ChangeOrderID>212</ChangeOrderID><Amount>0.0000</Amount><Udf_CostReimbursableFlag>false</Udf_CostReimbursableFlag></SubContractChangeOrders><SubContractChangeOrders><AGCol>3</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>111</Contract_x0020_Number><ContractID>87</ContractID><ChangeOrderID>12</ChangeOrderID><Amount>300.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>4</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>222</Contract_x0020_Number><ContractID>80</ContractID><ChangeOrderID>6</ChangeOrderID><Amount>100.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>5</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>777</Contract_x0020_Number><ContractID>79</ContractID><ChangeOrderID>5</ChangeOrderID><Amount>200.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>6</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>786</Contract_x0020_Number><ContractID>77</ContractID><ChangeOrderID>3</ChangeOrderID><Amount>100.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>7</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>787</Contract_x0020_Number><ContractID>78</ContractID><ChangeOrderID>4</ChangeOrderID><Amount>500.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>8</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>Con 009</Contract_x0020_Number><ContractID>219</ContractID><ChangeOrderID>78</ChangeOrderID><Amount>9000.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>9</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>Con 010</Contract_x0020_Number><ContractID>220</ContractID><ChangeOrderID>79</ChangeOrderID><Amount>13000.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>10</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>Con 012</Contract_x0020_Number><ContractID>222</ContractID><ChangeOrderID>83</ChangeOrderID><Amount>2300.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>11</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>Con 020</Contract_x0020_Number><ContractID>226</ContractID><ChangeOrderID>86</ChangeOrderID><Amount>5400.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>12</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>Con 021</Contract_x0020_Number><ContractID>227</ContractID><ChangeOrderID>87</ChangeOrderID><Amount>2300.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>13</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>Con001</Contract_x0020_Number><ContractID>208</ContractID><ChangeOrderID>72</ChangeOrderID><Amount>3000.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>14</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>Con002</Contract_x0020_Number><ContractID>209</ContractID><ChangeOrderID>73</ChangeOrderID><Amount>400.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>15</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>Con003</Contract_x0020_Number><ContractID>210</ContractID><ChangeOrderID>74</ChangeOrderID><Amount>6000.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>16</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>Con004</Contract_x0020_Number><ContractID>211</ContractID><ChangeOrderID>75</ChangeOrderID><Amount>9000.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>17</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>Con005</Contract_x0020_Number><ContractID>213</ContractID><ChangeOrderID>76</ChangeOrderID><Amount>17000.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>18</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>Cont001</Contract_x0020_Number><ContractID>228</ContractID><ChangeOrderID>89</ChangeOrderID><Amount>2000.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>19</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>PUR001</Contract_x0020_Number><ContractID>229</ContractID><ChangeOrderID>88</ChangeOrderID><Amount>1000.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>20</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>PUR002</Contract_x0020_Number><ContractID>230</ContractID><ChangeOrderID>90</ChangeOrderID><Amount>3000.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>21</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>SC-002</Contract_x0020_Number><ContractID>2</ContractID><ChangeOrderID>7</ChangeOrderID><Amount>200.0000</Amount></SubContractChangeOrders><SubContractChangeOrders><AGCol>22</AGCol><SCO_x0020_Number>001</SCO_x0020_Number><Contract_x0020_Number>SC-004</Contract_x0020_Number><ContractID>7</ContractID><ChangeOrderID>65</ChangeOrderID><Amount>1000.0000</Amount></SubContractChangeOrders></NewDataSet> I'm trying to read it back as follows using (SqlConnection con = new SqlConnection("Server=#####;Initial Catalog=#####;User ID=####;Pwd=########")) { using (SqlCommand com = new SqlCommand("select * from dbo.tbl_#####", con)) { using (SqlDataAdapter ada = new SqlDataAdapter(com)) { ada.Fill(dt); MemoryStream ms = new MemoryStream(); object contractXML1 = dt.Rows[0]["SCOXML1"]; System.Runtime.Serialization.Formatters.Binary.BinaryFormatter bf = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter(); bf.Serialize(ms, contractXML1); ms.Seek(0, SeekOrigin.Begin); ds.ReadXml(ms); } } } I'm getting the following error Data at the root level is invalid. Line 1, position 6. Any ideas? NLV

    Read the article

  • Please help me understand why my XSL Transform is not transforming

    - by Damovisa
    I'm trying to transform one XML format to another using XSL. Try as I might, I can't seem to get a result. I've hacked away at this for a while now and I've had no success. I'm not even getting any exceptions. I'm going to post the entire code and hopefully someone can help me work out what I've done wrong. I'm aware there are likely to be problems in the xsl I have in terms of selects and matches, but I'm not fussed about that at the moment. The output I'm getting is the input XML without any XML tags. The transformation is simply not occurring. Here's my XML Document: <?xml version="1.0"?> <Transactions> <Account> <PersonalAccount> <AccountNumber>066645621</AccountNumber> <AccountName>A Smith</AccountName> <CurrentBalance>-200125.96</CurrentBalance> <AvailableBalance>0</AvailableBalance> <AccountType>LOAN</AccountType> </PersonalAccount> </Account> <StartDate>2010-03-01T00:00:00</StartDate> <EndDate>2010-03-23T00:00:00</EndDate> <Items> <Transaction> <ErrorNumber>-1</ErrorNumber> <Amount>12000</Amount> <Reference>Transaction 1</Reference> <CreatedDate>0001-01-01T00:00:00</CreatedDate> <EffectiveDate>2010-03-15T00:00:00</EffectiveDate> <IsCredit>true</IsCredit> <Balance>-324000</Balance> </Transaction> <Transaction> <ErrorNumber>-1</ErrorNumber> <Amount>11000</Amount> <Reference>Transaction 2</Reference> <CreatedDate>0001-01-01T00:00:00</CreatedDate> <EffectiveDate>2010-03-14T00:00:00</EffectiveDate> <IsCredit>true</IsCredit> <Balance>-324000</Balance> </Transaction> </Items> </Transactions> Here's my XSLT: <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="xml" /> <xsl:param name="currentdate"></xsl:param> <xsl:template match="Transactions"> <xsl:element name="OFX"> <xsl:element name="SIGNONMSGSRSV1"> <xsl:element name="SONRS"> <xsl:element name="STATUS"> <xsl:element name="CODE">0</xsl:element> <xsl:element name="SEVERITY">INFO</xsl:element> </xsl:element> <xsl:element name="DTSERVER"><xsl:value-of select="$currentdate" /></xsl:element> <xsl:element name="LANGUAGE">ENG</xsl:element> </xsl:element> </xsl:element> <xsl:element name="BANKMSGSRSV1"> <xsl:element name="STMTTRNRS"> <xsl:element name="TRNUID">1</xsl:element> <xsl:element name="STATUS"> <xsl:element name="CODE">0</xsl:element> <xsl:element name="SEVERITY">INFO</xsl:element> </xsl:element> <xsl:element name="STMTRS"> <xsl:element name="CURDEF">AUD</xsl:element> <xsl:element name="BANKACCTFROM"> <xsl:element name="BANKID">RAMS</xsl:element> <xsl:element name="ACCTID"><xsl:value-of select="Account/PersonalAccount/AccountNumber" /></xsl:element> <xsl:element name="ACCTTYPE"><xsl:value-of select="Account/PersonalAccount/AccountType" /></xsl:element> </xsl:element> <xsl:element name="BANKTRANLIST"> <xsl:element name="DTSTART"><xsl:value-of select="StartDate" /></xsl:element> <xsl:element name="DTEND"><xsl:value-of select="EndDate" /></xsl:element> <xsl:for-each select="Items/Transaction"> <xsl:element name="STMTTRN"> <xsl:element name="TRNTYPE"><xsl:choose><xsl:when test="IsCredit">CREDIT</xsl:when><xsl:otherwise>DEBIT</xsl:otherwise></xsl:choose></xsl:element> <xsl:element name="DTPOSTED"><xsl:value-of select="EffectiveDate" /></xsl:element> <xsl:element name="DTUSER"><xsl:value-of select="CreatedDate" /></xsl:element> <xsl:element name="TRNAMT"><xsl:value-of select="Amount" /></xsl:element> <xsl:element name="FITID" /> <xsl:element name="NAME"><xsl:value-of select="Reference" /></xsl:element> <xsl:element name="MEMO"><xsl:value-of select="Reference" /></xsl:element> </xsl:element> </xsl:for-each> </xsl:element> <xsl:element name="LEDGERBAL"> <xsl:element name="BALAMT"><xsl:value-of select="Account/PersonalAccount/CurrentBalance" /></xsl:element> <xsl:element name="DTASOF"><xsl:value-of select="EndDate" /></xsl:element> </xsl:element> </xsl:element> </xsl:element> </xsl:element> </xsl:element> </xsl:template> </xsl:stylesheet> Here's my method to transform my XML: public string TransformToXml(XmlElement xmlElement, Dictionary<string, object> parameters) { string strReturn = ""; // Load the XSLT Document XslCompiledTransform xslt = new XslCompiledTransform(); xslt.Load(xsltFileName); // arguments XsltArgumentList args = new XsltArgumentList(); if (parameters != null && parameters.Count > 0) { foreach (string key in parameters.Keys) { args.AddParam(key, "", parameters[key]); } } //Create a memory stream to write to Stream objStream = new MemoryStream(); // Apply the transform xslt.Transform(xmlElement, args, objStream); objStream.Seek(0, SeekOrigin.Begin); // Read the contents of the stream StreamReader objSR = new StreamReader(objStream); strReturn = objSR.ReadToEnd(); return strReturn; } The contents of strReturn is an XML tag (<?xml version="1.0" encoding="utf-8"?>) followed by a raw dump of the contents of the original XML document, stripped of XML tags. What am I doing wrong here?

    Read the article

  • Optimizing a lot of Scanner.findWithinHorizon(pattern, 0) calls

    - by darvids0n
    I'm building a process which extracts data from 6 csv-style files and two poorly laid out .txt reports and builds output CSVs, and I'm fully aware that there's going to be some overhead searching through all that whitespace thousands of times, but I never anticipated converting about about 50,000 records would take 12 hours. Excerpt of my manual matching code (I know it's horrible that I use lists of tokens like that, but it was the best thing I could think of): public static String lookup(List<String> tokensBefore, List<String> tokensAfter) { String result = null; while(_match(tokensBefore)) { // block until all input is read if(id.hasNext()) { result = id.next(); // capture the next token that matches if(_matchImmediate(tokensAfter)) // try to match tokensAfter to this result return result; } else return null; // end of file; no match } return null; // no matches } private static boolean _match(List<String> tokens) { return _match(tokens, true); } private static boolean _match(List<String> tokens, boolean block) { if(tokens != null && !tokens.isEmpty()) { if(id.findWithinHorizon(tokens.get(0), 0) == null) return false; for(int i = 1; i <= tokens.size(); i++) { if (i == tokens.size()) { // matches all tokens return true; } else if(id.hasNext() && !id.next().matches(tokens.get(i))) { break; // break to blocking behaviour } } } else { return true; // empty list always matches } if(block) return _match(tokens); // loop until we find something or nothing else return false; // return after just one attempted match } private static boolean _matchImmediate(List<String> tokens) { if(tokens != null) { for(int i = 0; i <= tokens.size(); i++) { if (i == tokens.size()) { // matches all tokens return true; } else if(!id.hasNext() || !id.next().matches(tokens.get(i))) { return false; // doesn't match, or end of file } } return false; // we have some serious problems if this ever gets called } else { return true; // empty list always matches } } Basically wondering how I would work in an efficient string search (Boyer-Moore or similar). My Scanner id is scanning a java.util.String, figured buffering it to memory would reduce I/O since the search here is being performed thousands of times on a relatively small file. The performance increase compared to scanning a BufferedReader(FileReader(File)) was probably less than 1%, the process still looks to be taking a LONG time. I've also traced execution and the slowness of my overall conversion process is definitely between the first and last like of the lookup method. In fact, so much so that I ran a shortcut process to count the number of occurrences of various identifiers in the .csv-style files (I use 2 lookup methods, this is just one of them) and the process completed indexing approx 4 different identifiers for 50,000 records in less than a minute. Compared to 12 hours, that's instant. Some notes (updated): I don't necessarily need the pattern-matching behaviour, I only get the first field of a line of text so I need to match line breaks or use Scanner.nextLine(). All ID numbers I need start at position 0 of a line and run through til the first block of whitespace, after which is the name of the corresponding object. I would ideally want to return a String, not an int locating the line number or start position of the result, but if it's faster then it will still work just fine. If an int is being returned, however, then I would now have to seek to that line again just to get the ID; storing the ID of every line that is searched sounds like a way around that. Anything to help me out, even if it saves 1ms per search, will help, so all input is appreciated. Thankyou! Usage scenario 1: I have a list of objects in file A, who in the old-style system have an id number which is not in file A. It is, however, POSSIBLY in another csv-style file (file B) or possibly still in a .txt report (file C) which each also contain a bunch of other information which is not useful here, and so file B needs to be searched through for the object's full name (1 token since it would reside within the second column of any given line), and then the first column should be the ID number. If that doesn't work, we then have to split the search token by whitespace into separate tokens before doing a search of file C for those tokens as well. Generalised code: String field; for (/* each record in file A */) { /* construct the rest of this object from file A info */ // now to find the ID, if we can List<String> objectName = new ArrayList<String>(1); objectName.add(Pattern.quote(thisObject.fullName)); field = lookup(objectSearchToken, objectName); // search file B if(field == null) // not found in file B { lookupReset(false); // initialise scanner to check file C objectName.clear(); // not using the full name String[] tokens = thisObject.fullName.split(id.delimiter().pattern()); for(String s : tokens) objectName.add(Pattern.quote(s)); field = lookup(objectSearchToken, objectName); // search file C lookupReset(true); // back to file B } else { /* found it, file B specific processing here */ } if(field != null) // found it in B or C thisObject.ID = field; } The objectName tokens are all uppercase words with possible hyphens or apostrophes in them, separated by spaces. Much like a person's name. As per a comment, I will pre-compile the regex for my objectSearchToken, which is just [\r\n]+. What's ending up happening in file C is, every single line is being checked, even the 95% of lines which don't contain an ID number and object name at the start. Would it be quicker to use ^[\r\n]+.*(objectname) instead of two separate regexes? It may reduce the number of _match executions. The more general case of that would be, concatenate all tokensBefore with all tokensAfter, and put a .* in the middle. It would need to be matching backwards through the file though, otherwise it would match the correct line but with a huge .* block in the middle with lots of lines. The above situation could be resolved if I could get java.util.Scanner to return the token previous to the current one after a call to findWithinHorizon. I have another usage scenario. Will put it up asap.

    Read the article

  • Traditional IO vs memory-mapped

    - by Senne
    I'm trying to illustrate the difference in performance between traditional IO and memory mapped files in java to students. I found an example somewhere on internet but not everything is clear to me, I don't even think all steps are nececery. I read a lot about it here and there but I'm not convinced about a correct implementation of neither of them. The code I try to understand is: public class FileCopy{ public static void main(String args[]){ if (args.length < 1){ System.out.println(" Wrong usage!"); System.out.println(" Correct usage is : java FileCopy <large file with full path>"); System.exit(0); } String inFileName = args[0]; File inFile = new File(inFileName); if (inFile.exists() != true){ System.out.println(inFileName + " does not exist!"); System.exit(0); } try{ new FileCopy().memoryMappedCopy(inFileName, inFileName+".new" ); new FileCopy().customBufferedCopy(inFileName, inFileName+".new1"); }catch(FileNotFoundException fne){ fne.printStackTrace(); }catch(IOException ioe){ ioe.printStackTrace(); }catch (Exception e){ e.printStackTrace(); } } public void memoryMappedCopy(String fromFile, String toFile ) throws Exception{ long timeIn = new Date().getTime(); // read input file RandomAccessFile rafIn = new RandomAccessFile(fromFile, "rw"); FileChannel fcIn = rafIn.getChannel(); ByteBuffer byteBuffIn = fcIn.map(FileChannel.MapMode.READ_WRITE, 0,(int) fcIn.size()); fcIn.read(byteBuffIn); byteBuffIn.flip(); RandomAccessFile rafOut = new RandomAccessFile(toFile, "rw"); FileChannel fcOut = rafOut.getChannel(); ByteBuffer writeMap = fcOut.map(FileChannel.MapMode.READ_WRITE,0,(int) fcIn.size()); writeMap.put(byteBuffIn); long timeOut = new Date().getTime(); System.out.println("Memory mapped copy Time for a file of size :" + (int) fcIn.size() +" is "+(timeOut-timeIn)); fcOut.close(); fcIn.close(); } static final int CHUNK_SIZE = 100000; static final char[] inChars = new char[CHUNK_SIZE]; public static void customBufferedCopy(String fromFile, String toFile) throws IOException{ long timeIn = new Date().getTime(); Reader in = new FileReader(fromFile); Writer out = new FileWriter(toFile); while (true) { synchronized (inChars) { int amountRead = in.read(inChars); if (amountRead == -1) { break; } out.write(inChars, 0, amountRead); } } long timeOut = new Date().getTime(); System.out.println("Custom buffered copy Time for a file of size :" + (int) new File(fromFile).length() +" is "+(timeOut-timeIn)); in.close(); out.close(); } } When exactly is it nececary to use RandomAccessFile? Here it is used to read and write in the memoryMappedCopy, is it actually nececary just to copy a file at all? Or is it a part of memorry mapping? In customBufferedCopy, why is synchronized used here? I also found a different example that -should- test the performance between the 2: public class MappedIO { private static int numOfInts = 4000000; private static int numOfUbuffInts = 200000; private abstract static class Tester { private String name; public Tester(String name) { this.name = name; } public long runTest() { System.out.print(name + ": "); try { long startTime = System.currentTimeMillis(); test(); long endTime = System.currentTimeMillis(); return (endTime - startTime); } catch (IOException e) { throw new RuntimeException(e); } } public abstract void test() throws IOException; } private static Tester[] tests = { new Tester("Stream Write") { public void test() throws IOException { DataOutputStream dos = new DataOutputStream( new BufferedOutputStream( new FileOutputStream(new File("temp.tmp")))); for(int i = 0; i < numOfInts; i++) dos.writeInt(i); dos.close(); } }, new Tester("Mapped Write") { public void test() throws IOException { FileChannel fc = new RandomAccessFile("temp.tmp", "rw") .getChannel(); IntBuffer ib = fc.map( FileChannel.MapMode.READ_WRITE, 0, fc.size()) .asIntBuffer(); for(int i = 0; i < numOfInts; i++) ib.put(i); fc.close(); } }, new Tester("Stream Read") { public void test() throws IOException { DataInputStream dis = new DataInputStream( new BufferedInputStream( new FileInputStream("temp.tmp"))); for(int i = 0; i < numOfInts; i++) dis.readInt(); dis.close(); } }, new Tester("Mapped Read") { public void test() throws IOException { FileChannel fc = new FileInputStream( new File("temp.tmp")).getChannel(); IntBuffer ib = fc.map( FileChannel.MapMode.READ_ONLY, 0, fc.size()) .asIntBuffer(); while(ib.hasRemaining()) ib.get(); fc.close(); } }, new Tester("Stream Read/Write") { public void test() throws IOException { RandomAccessFile raf = new RandomAccessFile( new File("temp.tmp"), "rw"); raf.writeInt(1); for(int i = 0; i < numOfUbuffInts; i++) { raf.seek(raf.length() - 4); raf.writeInt(raf.readInt()); } raf.close(); } }, new Tester("Mapped Read/Write") { public void test() throws IOException { FileChannel fc = new RandomAccessFile( new File("temp.tmp"), "rw").getChannel(); IntBuffer ib = fc.map( FileChannel.MapMode.READ_WRITE, 0, fc.size()) .asIntBuffer(); ib.put(0); for(int i = 1; i < numOfUbuffInts; i++) ib.put(ib.get(i - 1)); fc.close(); } } }; public static void main(String[] args) { for(int i = 0; i < tests.length; i++) System.out.println(tests[i].runTest()); } } I more or less see whats going on, my output looks like this: Stream Write: 653 Mapped Write: 51 Stream Read: 651 Mapped Read: 40 Stream Read/Write: 14481 Mapped Read/Write: 6 What is makeing the Stream Read/Write so unbelievably long? And as a read/write test, to me it looks a bit pointless to read the same integer over and over (if I understand well what's going on in the Stream Read/Write) Wouldn't it be better to read int's from the previously written file and just read and write ints on the same place? Is there a better way to illustrate it? I've been breaking my head about a lot of these things for a while and I just can't get the whole picture..

    Read the article

  • Suggestions for duplicate file finder algorithm (using C)

    - by Andrei Ciobanu
    Hello, I wanted to write a program that test if two files are duplicates (have exactly the same content). First I test if the files have the same sizes, and if they have i start to compare their contents. My first idea, was to "split" the files into fixed size blocks, then start a thread for every block, fseek to startup character of every block and continue the comparisons in parallel. When a comparison from a thread fails, the other working threads are canceled, and the program exits out of the thread spawning loop. The code looks like this: dupf.h #ifndef __NM__DUPF__H__ #define __NM__DUPF__H__ #define NUM_THREADS 15 #define BLOCK_SIZE 8192 /* Thread argument structure */ struct thread_arg_s { const char *name_f1; /* First file name */ const char *name_f2; /* Second file name */ int cursor; /* Where to seek in the file */ }; typedef struct thread_arg_s thread_arg; /** * 'arg' is of type thread_arg. * Checks if the specified file blocks are * duplicates. */ void *check_block_dup(void *arg); /** * Checks if two files are duplicates */ int check_dup(const char *name_f1, const char *name_f2); /** * Returns a valid pointer to a file. * If the file (given by the path/name 'fname') cannot be opened * in 'mode', the program is interrupted an error message is shown. **/ FILE *safe_fopen(const char *name, const char *mode); #endif dupf.c #include <errno.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include "dupf.h" FILE *safe_fopen(const char *fname, const char *mode) { FILE *f = NULL; f = fopen(fname, mode); if (f == NULL) { char emsg[255]; sprintf(emsg, "FOPEN() %s\t", fname); perror(emsg); exit(-1); } return (f); } void *check_block_dup(void *arg) { const char *name_f1 = NULL, *name_f2 = NULL; /* File names */ FILE *f1 = NULL, *f2 = NULL; /* Streams */ int cursor = 0; /* Reading cursor */ char buff_f1[BLOCK_SIZE], buff_f2[BLOCK_SIZE]; /* Character buffers */ int rchars_1, rchars_2; /* Readed characters */ /* Initializing variables from 'arg' */ name_f1 = ((thread_arg*)arg)->name_f1; name_f2 = ((thread_arg*)arg)->name_f2; cursor = ((thread_arg*)arg)->cursor; /* Opening files */ f1 = safe_fopen(name_f1, "r"); f2 = safe_fopen(name_f2, "r"); /* Setup cursor in files */ fseek(f1, cursor, SEEK_SET); fseek(f2, cursor, SEEK_SET); /* Initialize buffers */ rchars_1 = fread(buff_f1, 1, BLOCK_SIZE, f1); rchars_2 = fread(buff_f2, 1, BLOCK_SIZE, f2); if (rchars_1 != rchars_2) { /* fread failed to read the same portion. * program cannot continue */ perror("ERROR WHEN READING BLOCK"); exit(-1); } while (rchars_1-->0) { if (buff_f1[rchars_1] != buff_f2[rchars_1]) { /* Different characters */ fclose(f1); fclose(f2); pthread_exit("notdup"); } } /* Close streams */ fclose(f1); fclose(f2); pthread_exit("dup"); } int check_dup(const char *name_f1, const char *name_f2) { int num_blocks = 0; /* Number of 'blocks' to check */ int num_tsp = 0; /* Number of threads spawns */ int tsp_iter = 0; /* Iterator for threads spawns */ pthread_t *tsp_threads = NULL; thread_arg *tsp_threads_args = NULL; int tsp_threads_iter = 0; int thread_c_res = 0; /* Thread creation result */ int thread_j_res = 0; /* Thread join res */ int loop_res = 0; /* Function result */ int cursor; struct stat buf_f1; struct stat buf_f2; if (name_f1 == NULL || name_f2 == NULL) { /* Invalid input parameters */ perror("INVALID FNAMES\t"); return (-1); } if (stat(name_f1, &buf_f1) != 0 || stat(name_f2, &buf_f2) != 0) { /* Stat fails */ char emsg[255]; sprintf(emsg, "STAT() ERROR: %s %s\t", name_f1, name_f2); perror(emsg); return (-1); } if (buf_f1.st_size != buf_f2.st_size) { /* File have different sizes */ return (1); } /* Files have the same size, function exec. is continued */ num_blocks = (buf_f1.st_size / BLOCK_SIZE) + 1; num_tsp = (num_blocks / NUM_THREADS) + 1; cursor = 0; for (tsp_iter = 0; tsp_iter < num_tsp; tsp_iter++) { loop_res = 0; /* Create threads array for this spawn */ tsp_threads = malloc(NUM_THREADS * sizeof(*tsp_threads)); if (tsp_threads == NULL) { perror("TSP_THREADS ALLOC FAILURE\t"); return (-1); } /* Create arguments for every thread in the current spawn */ tsp_threads_args = malloc(NUM_THREADS * sizeof(*tsp_threads_args)); if (tsp_threads_args == NULL) { perror("TSP THREADS ARGS ALLOCA FAILURE\t"); return (-1); } /* Initialize arguments and create threads */ for (tsp_threads_iter = 0; tsp_threads_iter < NUM_THREADS; tsp_threads_iter++) { if (cursor >= buf_f1.st_size) { break; } tsp_threads_args[tsp_threads_iter].name_f1 = name_f1; tsp_threads_args[tsp_threads_iter].name_f2 = name_f2; tsp_threads_args[tsp_threads_iter].cursor = cursor; thread_c_res = pthread_create( &tsp_threads[tsp_threads_iter], NULL, check_block_dup, (void*)&tsp_threads_args[tsp_threads_iter]); if (thread_c_res != 0) { perror("THREAD CREATION FAILURE"); return (-1); } cursor+=BLOCK_SIZE; } /* Join last threads and get their status */ while (tsp_threads_iter-->0) { void *thread_res = NULL; thread_j_res = pthread_join(tsp_threads[tsp_threads_iter], &thread_res); if (thread_j_res != 0) { perror("THREAD JOIN FAILURE"); return (-1); } if (strcmp((char*)thread_res, "notdup")==0) { loop_res++; /* Closing other threads and exiting by condition * from loop. */ while (tsp_threads_iter-->0) { pthread_cancel(tsp_threads[tsp_threads_iter]); } } } free(tsp_threads); free(tsp_threads_args); if (loop_res > 0) { break; } } return (loop_res > 0) ? 1 : 0; } The function works fine (at least for what I've tested). Still, some guys from #C (freenode) suggested that the solution is overly complicated, and it may perform poorly because of parallel reading on hddisk. What I want to know: Is the threaded approach flawed by default ? Is fseek() so slow ? Is there a way to somehow map the files to memory and then compare them ?

    Read the article

  • Ideas for multiplatform encrypted java mobile storage system

    - by Fernando Miguélez
    Objective I am currently designing the API for a multiplatform storage system that would offer same interface and capabilities accross following supported mobile Java Platforms: J2ME. Minimum configuration/profile CLDC 1.1/MIDP 2.0 with support for some necessary JSRs (JSR-75 for file storage). Android. No minimum platform version decided yet, but rather likely could be API level 7. Blackberry. It would use the same base source of J2ME but taking advantage of some advaced capabilities of the platform. No minimum configuration decided yet (maybe 4.6 because of 64 KB limitation for RMS on 4.5). Basically the API would sport three kind of stores: Files. These would allow standard directory/file manipulation (read/write through streams, create, mkdir, etc.). Preferences. It is a special store that handles properties accessed through keys (Similar to plain old java properties file but supporting some improvements such as different value data types such as SharedPreferences on Android platform) Local Message Queues. This store would offer basic message queue functionality. Considerations Inspired on JSR-75, all types of stores would be accessed in an uniform way by means of an URL following RFC 1738 conventions, but with custom defined prefixes (i.e. "file://" for files, "prefs://" for preferences or "queue://" for message queues). The address would refer to a virtual location that would be mapped to a physical storage object by each mobile platform implementation. Only files would allow hierarchical storage (folders) and access to external extorage memory cards (by means of a unit name, the same way as in JSR-75, but that would not change regardless of underlying platform). The other types would only support flat storage. The system should also support a secure version of all basic types. The user would indicate it by prefixing "s" to the URL (i.e. "sfile://" instead of "file://"). The API would only require one PIN (introduced only once) to access any kind of secure object types. Implementation issues For the implementation of both plaintext and encrypted stores, I would use the functionality available on the underlying platforms: Files. These are available on all platforms (J2ME only with JSR-75, but it is mandatory for our needs). The abstract File to actual File mapping is straight except for addressing issues. RMS. This type of store available on J2ME (and Blackberry) platforms is convenient for Preferences and maybe Message Queues (though depending on performance or size requirements these could be implemented by means of normal files). SharedPreferences. This type of storage, only available on Android, would match Preferences needs. SQLite databases. This could be used for message queues on Android (and maybe Blackberry). When it comes to encryption some requirements should be met: To ease the implementation it will be carried out on read/write operations basis on streams (for files), RMS Records, SharedPreferences key-value pairs, SQLite database columns. Every underlying storage object should use the same encryption key. Handling of encrypted stores should be the same as the unencrypted counterpart. The only difference (from the user point of view) accessing an encrypted store would be the addressing. The user PIN provides access to any secure storage object, but the change of it would not require to decrypt/re-encrypt all the encrypted data. Cryptographic capabilities of underlying platform should be used whenever it is possible, so we would use: J2ME: SATSA-CRYPTO if it is available (not mandatory) or lightweight BoncyCastle cryptographic framework for J2ME. Blackberry: RIM Cryptographic API or BouncyCastle Android: JCE with integraced cryptographic provider (BouncyCastle?) Doubts Having reached this point I was struck by some doubts about what solution would be more convenient, taking into account the limitation of the plataforms. These are some of my doubts: Encryption Algorithm for data. Would AES-128 be strong and fast enough? What alternatives for such scenario would you suggest? Encryption Mode. I have read about the weakness of ECB encryption versus CBC, but in this case the first would have the advantage of random access to blocks, which is interesting for seek functionality on files. What type of encryption mode would you choose instead? Is stream encryption suitable for this case? Key generation. There could be one key generated for each storage object (file, RMS RecordStore, etc.) or just use one for all the objects of the same type. The first seems "safer", though it would require some extra space on device. In your opinion what would the trade-offs of each? Key storage. For this case using a standard JKS (or PKCS#12) KeyStore file could be suited to store encryption keys, but I could also define a smaller structure (encryption-transformation / key data / checksum) that could be attached to each storage store (i.e. using addition files with the same name and special extension for plain files or embedded inside other types of objects such as RMS Record Stores). What approach would you prefer? And when it comes to using a standard KeyStore with multiple-key generation (given this is your preference), would it be better to use a record-store per storage object or just a global KeyStore keeping all keys (i.e. using the URL identifier of abstract storage object as alias)? Master key. The use of a master key seems obvious. This key should be protected by user PIN (introduced only once) and would allow access to the rest of encryption keys (they would be encrypted by means of this master key). Changing the PIN would only require to reencrypt this key and not all the encrypted data. Where would you keep it taking into account that if this got lost all data would be no further accesible? What further considerations should I take into account? Platform cryptography support. Do SATSA-CRYPTO-enabled J2ME phones really take advantage of some dedicated hardware acceleration (or other advantage I have not foreseen) and would this approach be prefered (whenever possible) over just BouncyCastle implementation? For the same reason is RIM Cryptographic API worth the license cost over BouncyCastle? Any comments, critics, further considerations or different approaches are welcome.

    Read the article

  • a value that shows in select mode disappears in edit mode from a gridview column

    - by Jbob Johan
    i have a gridview(GridView1) with a few Bound Fields first one is Date (ActivityDate) from a table named "tblTime" i have managed to add one extra colum (manually), that is not bound that shows dayInWeek value according to the "ActivityDate" field programtically in CodeBehind but when i enter into Edit Mode , all Bound fields are showing their values correctly but the one column i have added manually will not show the value as it did in "select mode"(first mode b4 trying to edit) while im not a great dibbagger i have manged to view the cell's value (GridView1.Rows[e.NewEditIndex].Cells[1].Text) which does hold on to the day in week value but it does not appear in gridview edit mode only this is some of the code protected void GridView1_RowDataBound(object sender, GridViewRowEventArgs e) { if (e.Row.RowType == DataControlRowType.Header) { e.Row.Cells[0].Text = "?????"; //Activity Date (in hebrew) e.Row.Cells[1].Text = "??? ?????"; //DayinWeek e.Row.Cells[2].Text = "??????"; //ActivityType (work seek vacation) named Reason e.Row.Cells[3].Text = "??? ?????"; //time finish (to Work) e.Row.Cells[4].Text = "??? ?????"; //Time out (of work) } if (e.Row.RowType == DataControlRowType.DataRow) { if (Convert.ToBoolean(ViewState["theSubIsclckd"]) == true) //if submit button clicked { try { string twekday1 = Convert.ToString(DataBinder.Eval(e.Row.DataItem, "ActiveDate")); twekday1 = twekday1.Remove(9, 11); //geting date only without the time- portion string[] arymd = twekday1.Split('/'); // spliting [d m y] in order to make int day = Convert.ToInt32(arymd[1]); // it into [m d y] ...also requierd int month = Convert.ToInt32(arymd[0]); // when i update the table int year = Convert.ToInt32(arymd[2]); DateTime ILDateInit = new DateTime(year, month, day); //finally extracting Day CultureInfo ILci = CultureInfo.CreateSpecificCulture("he-IL"); // in week //from the converted activity date string MyIL_DayInWeek = ILDateInit.ToString("dddd", ILci); ViewState["MyIL_DayInWeek"] = MyIL_DayInWeek; e.Row.Cells[1].Text = MyIL_DayInWeek; string displayReason = DataBinder.Eval(e.Row.DataItem, "Reason").ToString(); e.Row.Cells[2].Text = displayReason; } catch (System.Exception excep) { Js.functions.but bb = new Js.functions.but(); bb.buttonName = "rex"; bb.documentwrite = true; bb.testCsVar = excep.ToString(); bb.f1(bb); // this was supposed to throw exep in javaScript injected from code behid - alert } // just in case.. } } so that works for the non edit period of time then when i hit the edit ... no day in week shows THE aspX - after selcting date... name etc' , click on button to display gridview: <asp:Button ID="TheSubB" runat="server" Text="???" onclick="TheSubB_Click" /> <asp:GridView ID="GridView1" runat="server" OnRowDataBound="GridView1_RowDataBound" onrowediting="GridView1_RowEditing" onrowcancelingedit="GridView1_RowCancelingEdit" OnRowUpdating="GridView1_RowUpdating" BackColor="LightGoldenrodYellow" BorderColor="Tan" BorderWidth="1px" CellPadding="2" ForeColor="Black" GridLines="None" AutoGenerateColumns="False" DataKeyNames="tId" DataSourceID="SqlDataSource1" style="z-index: 1; left: 0%; top: 0%; position: relative; width: 812px; height: 59px; font-family:Arial; text-align: center;" AllowSorting="True" > <AlternatingRowStyle BackColor="PaleGoldenrod" /> <Columns> <asp:BoundField DataField="ActiveDate" HeaderText="ActiveDate" SortExpression="ActiveDate" ControlStyle-Width="70" DataFormatString="{0:dd/MM/yyyy}" > <ControlStyle Width="70px" /> </asp:BoundField> <asp:TemplateField HeaderText="???.???.??"> <EditItemTemplate> <asp:TextBox ID="dayinW_EditTB" runat="server"></asp:TextBox> </EditItemTemplate> <ItemTemplate> <asp:Label ID="dayInW_editLabel" runat="server"></asp:Label> </ItemTemplate> </asp:TemplateField> <asp:BoundField DataField="Reason" HeaderText="???? ?????" SortExpression="Reason" ControlStyle-Width="50"> <ControlStyle Width="50px" /> </asp:BoundField> <asp:BoundField DataField="TimeOut" HeaderText="TimeOut" SortExpression="TimeOut" ControlStyle-Width="50" DataFormatString="{0:HH:mm}" > <ControlStyle Width="50px"></ControlStyle> </asp:BoundField> <asp:BoundField DataField="TimeIn" HeaderText="TimeIn" SortExpression="TimeIn" ControlStyle-Width="50" DataFormatString="{0:HH:mm}" > <ControlStyle Width="50px"></ControlStyle> </asp:BoundField> <asp:TemplateField HeaderText="????" > <EditItemTemplate> <asp:ImageButton width="15" Height="15" ImageUrl="~/images/edit.png" runat="server" CausesValidation="True" CommandName="Update" Text="Update"> </asp:ImageButton> <asp:ImageButton Width="15" Height="15" ImageUrl="images/cancel.png" runat="server" CausesValidation="False" CommandName="Cancel" Text="Cancel"> </asp:ImageButton> </EditItemTemplate> <ItemTemplate> <asp:ImageButton width="25" Height="15" ImageUrl="images/edit.png" ID="EditIB" runat="server" CausesValidation="False" CommandName="Edit" AlternateText="????"></asp:ImageButton> </ItemTemplate> </asp:TemplateField> <asp:TemplateField HeaderText="???"> <ItemTemplate> <asp:ImageButton width="15" Height="15" ImageUrl="images/Delete.png" ID="DeleteIB" runat="server" CommandName="Delete" AlternateText="???" /> </ItemTemplate> </asp:TemplateField> </Columns> <FooterStyle BackColor="Tan" /> <HeaderStyle BackColor="Tan" Font-Bold="True" /> <PagerStyle BackColor="PaleGoldenrod" ForeColor="DarkSlateBlue" HorizontalAlign="Center" /> <SelectedRowStyle BackColor="DarkSlateBlue" ForeColor="GhostWhite" /> <SortedAscendingCellStyle BackColor="#FAFAE7" /> <SortedAscendingHeaderStyle BackColor="#DAC09E" /> <SortedDescendingCellStyle BackColor="#E1DB9C" /> <SortedDescendingHeaderStyle BackColor="#C2A47B" /> </asp:GridView>

    Read the article

  • Using libgrib2c in c++ application, linker error "Undefined reference to..."

    - by Rich
    EDIT: If you're going to be doing things with GRIB files I would recommend the GDAL library which is backed by the Open Source Geospatial Foundation. You will save yourself a lot of headache :) I'm using Qt creator in Ubuntu creating a c++ app. I am attempting to use an external lib, libgrib2c.a, that has a header grib2.h. Everything compiles, but when it tries to link I get the error: undefined reference to 'seekgb(_IO_FILE*, long, long, long*, long*) I have tried wrapping the header file with: extern "C"{ #include "grib2.h" } But it didn't fix anything so I figured that was not my problem. In the .pro file I have the line: include($${ROOT}/Shared/common/commonLibs.pri) and in commonLibs.pri I have: INCLUDEPATH+=$${ROOT}/external_libs/g2clib/include LIBS+=-L$${ROOT}/external_libs/g2clib/lib LIBS+=-lgrib2c I am not encountering an error finding the library. If I do a nm command on the libgrib2c.a I get: nm libgrib2c.a | grep seekgb seekgb.o: 00000000 T seekgb And when I run qmake with the additional argument of LIBS+=-Wl,--verbose I can find the lib file in the output: attempt to open /usr/lib/libgrib2c.so failed attempt to open /usr/lib/libgrib2c.a failed attempt to open /mnt/sdb1/ESMF/App/ESMF_App/../external_libs/linux/qwt_6.0.2/lib/libgrib2c.so failed attempt to open /mnt/sdb1/ESMF/App/ESMF_App/../external_libs/linux/qwt_6.0.2/lib/libgrib2c.a failed attempt to open ..//Shared/Config/lib/libgrib2c.so failed attempt to open ..//Shared/Config/lib/libgrib2c.a failed attempt to open ..//external_libs/libssh2/lib/libgrib2c.so failed attempt to open ..//external_libs/libssh2/lib/libgrib2c.a failed attempt to open ..//external_libs/openssl/lib/libgrib2c.so failed attempt to open ..//external_libs/openssl/lib/libgrib2c.a failed attempt to open ..//external_libs/g2clib/lib/libgrib2c.so failed attempt to open ..//external_libs/g2clib/lib/libgrib2c.a succeeded Although it doesn't show any of the .o files in the library is this because it is a c library in my c++ app? in the .cpp file that I am trying to use the library I have: #include "gribreader.h" #include <stdio.h> #include <stdlib.h> #include <external_libs/g2clib/include/grib2.h> #include <Shared/logging/Logger.hpp> //------------------------------------------------------------------------------ /// Opens a GRIB file from disk. /// /// This function opens the grib file and searches through it for how many GRIB /// messages are contained as well as their starting locations. /// /// \param a_filePath. The path to the file to be opened. /// \return True if successful, false if not. //------------------------------------------------------------------------------ bool GRIBReader::OpenGRIB(std::string a_filePath) { LOG(notification)<<"Attempting to open grib file: "<< a_filePath; if(isOpen()) { CloseGRIB(); } m_filePath = a_filePath; m_filePtr = fopen(a_filePath.c_str(), "r"); if(m_filePtr == NULL) { LOG(error)<<"Unable to open file: " << a_filePath; return false; } LOG(notification)<<"Successfully opened GRIB file"; g2int currentMessageSize(1); g2int seekPosition(0); g2int lengthToBeginningOfGrib(0); g2int seekLength(32000); int i(0); int iterationLimit(300); m_GRIBMessageLocations.clear(); m_GRIBMessageSizes.clear(); while(i < iterationLimit) { seekgb(m_filePtr, seekPosition, seekLength, &lengthToBeginningOfGrib, &currentMessageSize); if(currentMessageSize != 0) { LOG(verbose) << "Adding GRIB message location " << lengthToBeginningOfGrib << " with length " << currentMessageSize; m_GRIBMessageLocations.push_back(lengthToBeginningOfGrib); m_GRIBMessageSizes.push_back(currentMessageSize); seekPosition = lengthToBeginningOfGrib + currentMessageSize; LOG(verbose) << "GRIB seek position moved to " << seekPosition; } else { LOG(notification)<<"End of GRIB file found, after "<< i << " GRIB messages."; break; } } if(i >= iterationLimit) { LOG(warning) << "The iteration limit of " << iterationLimit << "was reached while searching for GRIB messages"; } return true; } And the header grib2.h is as follows: #ifndef _grib2_H #define _grib2_H #include<stdio.h> #define G2_VERSION "g2clib-1.4.0" #ifdef __64BIT__ typedef int g2int; typedef unsigned int g2intu; #else typedef long g2int; typedef unsigned long g2intu; #endif typedef float g2float; struct gtemplate { g2int type; /* 3=Grid Defintion Template. */ /* 4=Product Defintion Template. */ /* 5=Data Representation Template. */ g2int num; /* template number. */ g2int maplen; /* number of entries in the static part */ /* of the template. */ g2int *map; /* num of octets of each entry in the */ /* static part of the template. */ g2int needext; /* indicates whether or not the template needs */ /* to be extended. */ g2int extlen; /* number of entries in the template extension. */ g2int *ext; /* num of octets of each entry in the extension */ /* part of the template. */ }; typedef struct gtemplate gtemplate; struct gribfield { g2int version,discipline; g2int *idsect; g2int idsectlen; unsigned char *local; g2int locallen; g2int ifldnum; g2int griddef,ngrdpts; g2int numoct_opt,interp_opt,num_opt; g2int *list_opt; g2int igdtnum,igdtlen; g2int *igdtmpl; g2int ipdtnum,ipdtlen; g2int *ipdtmpl; g2int num_coord; g2float *coord_list; g2int ndpts,idrtnum,idrtlen; g2int *idrtmpl; g2int unpacked; g2int expanded; g2int ibmap; g2int *bmap; g2float *fld; }; typedef struct gribfield gribfield; /* Prototypes for unpacking API */ void seekgb(FILE *,g2int ,g2int ,g2int *,g2int *); g2int g2_info(unsigned char *,g2int *,g2int *,g2int *,g2int *); g2int g2_getfld(unsigned char *,g2int ,g2int ,g2int ,gribfield **); void g2_free(gribfield *); /* Prototypes for packing API */ g2int g2_create(unsigned char *,g2int *,g2int *); g2int g2_addlocal(unsigned char *,unsigned char *,g2int ); g2int g2_addgrid(unsigned char *,g2int *,g2int *,g2int *,g2int ); g2int g2_addfield(unsigned char *,g2int ,g2int *, g2float *,g2int ,g2int ,g2int *, g2float *,g2int ,g2int ,g2int *); g2int g2_gribend(unsigned char *); /* Prototypes for supporting routines */ extern double int_power(double, g2int ); extern void mkieee(g2float *,g2int *,g2int); void rdieee(g2int *,g2float *,g2int ); extern gtemplate *getpdstemplate(g2int); extern gtemplate *extpdstemplate(g2int,g2int *); extern gtemplate *getdrstemplate(g2int); extern gtemplate *extdrstemplate(g2int,g2int *); extern gtemplate *getgridtemplate(g2int); extern gtemplate *extgridtemplate(g2int,g2int *); extern void simpack(g2float *,g2int,g2int *,unsigned char *,g2int *); extern void compack(g2float *,g2int,g2int,g2int *,unsigned char *,g2int *); void misspack(g2float *,g2int ,g2int ,g2int *, unsigned char *, g2int *); void gbit(unsigned char *,g2int *,g2int ,g2int ); void sbit(unsigned char *,g2int *,g2int ,g2int ); void gbits(unsigned char *,g2int *,g2int ,g2int ,g2int ,g2int ); void sbits(unsigned char *,g2int *,g2int ,g2int ,g2int ,g2int ); int pack_gp(g2int *, g2int *, g2int *, g2int *, g2int *, g2int *, g2int *, g2int *, g2int *, g2int *, g2int *, g2int *, g2int *, g2int *, g2int *, g2int *, g2int *, g2int *, g2int *, g2int *); #endif /* _grib2_H */ I have been scratching my head for two days on this. If anyone has an idea on what to do or can point me in some sort of direction, I'm stumped. Also, if you have any comments on how I can improve this post I'd love to hear them, kinda new at this posting thing. Usually I'm able to find an answer in the vast stores of knowledge already contained on the web.

    Read the article

  • add a from to backup routine

    - by Gerard Flynn
    hi how do you put a process bar and button onto this code i have class and want to add a gui on to the code using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Text; using System.Windows.Forms; using System.Data.SqlClient; using System.IO; using System.Threading; using Tamir.SharpSsh; using System.Security.Cryptography; using ICSharpCode.SharpZipLib.Checksums; using ICSharpCode.SharpZipLib.Zip; using ICSharpCode.SharpZipLib.GZip; namespace backup { public partial class Form1 : Form { public Form1() { InitializeComponent(); } /// <summary> /// Summary description for Class1. /// </summary> public class Backup { private string dbName; private string dbUsername; private string dbPassword; private static string baseDir; private string backupName; private static bool isBackup; private string keyString; private string ivString; private string[] backupDirs = new string[0]; private string[] excludeDirs = new string[0]; private ZipOutputStream zipOutputStream; private string backupFile; private string zipFile; private string encryptedFile; static void Main() { Backup.Log("BackupUtility loaded"); try { new Backup(); if (!isBackup) MessageBox.Show("Restore complete"); } catch (Exception e) { Backup.Log(e.ToString()); if (!isBackup) MessageBox.Show("Error restoring!\r\n" + e.Message); } } private void LoadAppSettings() { this.backupName = System.Configuration.ConfigurationSettings.AppSettings["BackupName"].ToString(); this.dbName = System.Configuration.ConfigurationSettings.AppSettings["DBName"].ToString(); this.dbUsername = System.Configuration.ConfigurationSettings.AppSettings["DBUsername"].ToString(); this.dbPassword = System.Configuration.ConfigurationSettings.AppSettings["DBPassword"].ToString(); //default to using where we are executing this assembly from Backup.baseDir = System.Reflection.Assembly.GetExecutingAssembly().Location.Substring(0, System.Reflection.Assembly.GetExecutingAssembly().Location.LastIndexOf("\\")) + "\\"; Backup.isBackup = bool.Parse(System.Configuration.ConfigurationSettings.AppSettings["IsBackup"].ToString()); this.keyString = System.Configuration.ConfigurationSettings.AppSettings["KeyString"].ToString(); this.ivString = System.Configuration.ConfigurationSettings.AppSettings["IVString"].ToString(); this.backupDirs = GetSetting("BackupDirs", ','); this.excludeDirs = GetSetting("ExcludeDirs", ','); } private string[] GetSetting(string settingName, char delimiter) { if (System.Configuration.ConfigurationSettings.AppSettings[settingName] != null) { string settingVal = System.Configuration.ConfigurationSettings.AppSettings[settingName].ToString(); if (settingVal.Length > 0) return settingVal.Split(delimiter); } return new string[0]; } public Backup() { this.LoadAppSettings(); if (isBackup) this.DoBackup(); else this.DoRestore(); Log("Finished"); } private void DoRestore() { System.Windows.Forms.OpenFileDialog fileDialog = new System.Windows.Forms.OpenFileDialog(); fileDialog.Title = "Choose .encrypted file"; fileDialog.Filter = "Encrypted files (*.encrypted)|*.encrypted|All files (*.*)|*.*"; fileDialog.InitialDirectory = Backup.baseDir; if (fileDialog.ShowDialog() == System.Windows.Forms.DialogResult.OK) { //string encryptedFile = GetFileName("encrypted"); string encryptedFile = fileDialog.FileName; string decryptedFile = this.GetDecryptedFilename(encryptedFile); //string originalFile = GetFileName("original"); this.Decrypt(encryptedFile, decryptedFile); //this.UnzipFile(decryptedFile, originalFile); } } //use the same filename as the backup except replace ".encrypted" with ".decrypted.zip" private string GetDecryptedFilename(string encryptedFile) { string name = encryptedFile.Substring(0, encryptedFile.LastIndexOf(".")); name += ".decrypted.zip"; return name; } private void DoBackup() { this.backupFile = GetFileName("bak"); this.zipFile = GetFileName("zip"); this.encryptedFile = GetFileName("encrypted"); this.DeleteFiles(); this.zipOutputStream = new ZipOutputStream(File.Create(zipFile)); try { //backup database first if (this.dbName.Length > 0) { this.BackupDB(backupFile); this.ZipFile(backupFile, this.GetName(backupFile)); } //zip any directories specified in config file this.ZipUserSpecifiedFilesAndDirectories(this.backupDirs); } finally { this.zipOutputStream.Finish(); this.zipOutputStream.Close(); } this.Encrypt(zipFile, encryptedFile); this.SCPFile(encryptedFile); this.DeleteFiles(); } /// <summary> /// Deletes any files created by the backup process, namely the DB backup file, /// the zip of all files backuped up, and the encrypred zip file /// </summary> private void DeleteFiles() { File.Delete(this.backupFile); File.Delete(this.zipFile); ///File.Delete(this.encryptedFile); } private void ZipUserSpecifiedFilesAndDirectories(string[] fileNames) { foreach (string fileName in fileNames) { string name = fileName.Trim(); if (name.Length > 0) { Log("Zipping " + name); this.ZipFile(name, this.GetNameFromDir(name)); } } } private void SCPFile(string inputPath) { string sshServer = System.Configuration.ConfigurationSettings.AppSettings["SSHServer"].ToString(); string sshUsername = System.Configuration.ConfigurationSettings.AppSettings["SSHUsername"].ToString(); string sshPassword = System.Configuration.ConfigurationSettings.AppSettings["SSHPassword"].ToString(); if (sshServer.Length > 0 && sshUsername.Length > 0 && sshPassword.Length > 0) { Scp scp = new Scp(sshServer, sshUsername, sshPassword); //Copy a file from local machine to remote SSH server scp.Connect(); Log("Connected to " + sshServer); //scp.Put(inputPath, "/home/wal/temp.txt"); scp.Put(inputPath, GetName(inputPath)); scp.Close(); } else { Log("Not SCP as missing login details"); } } private string GetName(string inputPath) { FileInfo info = new FileInfo(inputPath); return info.Name; } private string GetNameFromDir(string inputPath) { DirectoryInfo info = new DirectoryInfo(inputPath); return info.Name; } private static void Log(string msg) { try { string toLog = DateTime.Now.ToString() + ": " + msg; System.Diagnostics.Debug.WriteLine(toLog); System.IO.FileStream fs = new System.IO.FileStream(baseDir + "app.log", System.IO.FileMode.OpenOrCreate, System.IO.FileAccess.ReadWrite); System.IO.StreamWriter m_streamWriter = new System.IO.StreamWriter(fs); m_streamWriter.BaseStream.Seek(0, System.IO.SeekOrigin.End); m_streamWriter.WriteLine(toLog); m_streamWriter.Flush(); m_streamWriter.Close(); fs.Close(); } catch (Exception e) { Console.WriteLine(e.ToString()); } } private byte[] GetFileBytes(string path) { FileStream stream = new FileStream(path, FileMode.Open); byte[] bytes = new byte[stream.Length]; stream.Read(bytes, 0, bytes.Length); stream.Close(); return bytes; } private void WriteFileBytes(byte[] bytes, string path) { FileStream stream = new FileStream(path, FileMode.Create); stream.Write(bytes, 0, bytes.Length); stream.Close(); } private void UnzipFile(string inputPath, string outputPath) { ZipInputStream zis = new ZipInputStream(File.OpenRead(inputPath)); ZipEntry theEntry = zis.GetNextEntry(); FileStream streamWriter = File.Create(outputPath); int size = 2048; byte[] data = new byte[2048]; while (true) { size = zis.Read(data, 0, data.Length); if (size > 0) { streamWriter.Write(data, 0, size); } else { break; } } streamWriter.Close(); zis.Close(); } private bool ExcludeDir(string dirName) { foreach (string excludeDir in this.excludeDirs) { if (dirName == excludeDir) return true; } return false; } private void ZipFile(string inputPath, string zipName) { FileAttributes fa = File.GetAttributes(inputPath); if ((fa & FileAttributes.Directory) != 0) { string dirName = zipName + "/"; ZipEntry entry1 = new ZipEntry(dirName); this.zipOutputStream.PutNextEntry(entry1); string[] subDirs = Directory.GetDirectories(inputPath); //create directories first foreach (string subDir in subDirs) { DirectoryInfo info = new DirectoryInfo(subDir); string name = info.Name; if (this.ExcludeDir(name)) Log("Excluding " + dirName + name); else this.ZipFile(subDir, dirName + name); } //then store files string[] fileNames = Directory.GetFiles(inputPath); foreach (string fileName in fileNames) { FileInfo info = new FileInfo(fileName); string name = info.Name; this.ZipFile(fileName, dirName + name); } } else { Crc32 crc = new Crc32(); this.zipOutputStream.SetLevel(6); // 0 - store only to 9 - means best compression FileStream fs = null; try { fs = File.OpenRead(inputPath); } catch (IOException ioEx) { Log("WARNING! " + ioEx.Message);//might be in use, skip file in this case } if (fs != null) { byte[] buffer = new byte[fs.Length]; fs.Read(buffer, 0, buffer.Length); ZipEntry entry = new ZipEntry(zipName); entry.DateTime = DateTime.Now; // set Size and the crc, because the information // about the size and crc should be stored in the header // if it is not set it is automatically written in the footer. // (in this case size == crc == -1 in the header) // Some ZIP programs have problems with zip files that don't store // the size and crc in the header. entry.Size = fs.Length; fs.Close(); crc.Reset(); crc.Update(buffer); entry.Crc = crc.Value; this.zipOutputStream.PutNextEntry(entry); this.zipOutputStream.Write(buffer, 0, buffer.Length); } } } private void Encrypt(string inputPath, string outputPath) { RijndaelManaged rijndaelManaged = new RijndaelManaged(); byte[] encrypted; byte[] toEncrypt; //Create a new key and initialization vector. //myRijndael.GenerateKey(); //myRijndael.GenerateIV(); /*des.GenerateKey(); des.GenerateIV(); string temp1 = Convert.ToBase64String(des.Key); string temp2 = Convert.ToBase64String(des.IV);*/ //Get the key and IV. byte[] key = Convert.FromBase64String(keyString); byte[] IV = Convert.FromBase64String(ivString); //Get an encryptor. ICryptoTransform encryptor = rijndaelManaged.CreateEncryptor(key, IV); //Encrypt the data. MemoryStream msEncrypt = new MemoryStream(); CryptoStream csEncrypt = new CryptoStream(msEncrypt, encryptor, CryptoStreamMode.Write); //Convert the data to a byte array. toEncrypt = this.GetFileBytes(inputPath); //Write all data to the crypto stream and flush it. csEncrypt.Write(toEncrypt, 0, toEncrypt.Length); csEncrypt.FlushFinalBlock(); //Get encrypted array of bytes. encrypted = msEncrypt.ToArray(); WriteFileBytes(encrypted, outputPath); } private void Decrypt(string inputPath, string outputPath) { RijndaelManaged myRijndael = new RijndaelManaged(); //DES des = new DESCryptoServiceProvider(); byte[] key = Convert.FromBase64String(keyString); byte[] IV = Convert.FromBase64String(ivString); byte[] encrypted = this.GetFileBytes(inputPath); byte[] fromEncrypt; //Get a decryptor that uses the same key and IV as the encryptor. ICryptoTransform decryptor = myRijndael.CreateDecryptor(key, IV); //Now decrypt the previously encrypted message using the decryptor // obtained in the above step. MemoryStream msDecrypt = new MemoryStream(encrypted); CryptoStream csDecrypt = new CryptoStream(msDecrypt, decryptor, CryptoStreamMode.Read); fromEncrypt = new byte[encrypted.Length]; //Read the data out of the crypto stream. int bytesRead = csDecrypt.Read(fromEncrypt, 0, fromEncrypt.Length); byte[] readBytes = new byte[bytesRead]; Array.Copy(fromEncrypt, 0, readBytes, 0, bytesRead); this.WriteFileBytes(readBytes, outputPath); } private string GetFileName(string extension) { return baseDir + backupName + "_" + DateTime.Now.ToString("yyyyMMdd") + "." + extension; } private void BackupDB(string backupPath) { string sql = @"DECLARE @Date VARCHAR(300), @Dir VARCHAR(4000) --Get today date SET @Date = CONVERT(VARCHAR, GETDATE(), 112) --Set the directory where the back up file is stored SET @Dir = '"; sql += backupPath; sql += @"' --create a 'device' to write to first EXEC sp_addumpdevice 'disk', 'temp_device', @Dir --now do the backup BACKUP DATABASE " + this.dbName; sql += @" TO temp_device WITH FORMAT --Drop the device EXEC sp_dropdevice 'temp_device' "; //Console.WriteLine("sql="+sql); Backup.Log("Starting backup of " + this.dbName); ExecuteSQL(sql); } /// <summary> /// Executes the specified SQL /// Returns true if no errors were encountered during execution /// </summary> /// <param name="procedureName"></param> private void ExecuteSQL(string sql) { SqlConnection conn = new SqlConnection(this.GetDBConnectString()); try { SqlCommand comm = new SqlCommand(sql, conn); conn.Open(); comm.ExecuteNonQuery(); } finally { conn.Close(); } } private string GetDBConnectString() { StringBuilder builder = new StringBuilder(); builder.Append("Data Source=127.0.0.1; User ID="); builder.Append(this.dbUsername); builder.Append("; Password="); builder.Append(this.dbPassword); builder.Append("; Initial Catalog="); builder.Append(this.dbName); builder.Append(";Connect Timeout=30"); return builder.ToString(); } } } }

    Read the article

< Previous Page | 20 21 22 23 24