data operations - Page 482

Need help debugging a huge chunk of JSON data...

- by meder

I have a huge chunk, so large that I can't manually edit the file and need to read it in and do regex operations to see what's wrong. Basically - my server is PHP 5.1.6 and I can't update it. This features an older json_decode which is less featured than the 5.2/5.3 versions. json_decode returns NULL and json_last_error is being invoked but the function doesn't exist except in PHP 5.3 so I'm manually trying to see what's wrong. $regex = '#[^0-9"$a-zA-Z{:}().]#'; $json = preg_replace( $regex, '', $json ); $tree = json_decode ( $json, true ); var_dump($tree); // NULL A snippet of the JSON.. somewhere in the middle {"109":0,"103":1,"102":59,"101":70,"100":4299,"94":0,"50":51,"46":0,"45":0,"44":0,"43":0,"42":0,"23":0,"22":0,"18":0,"17":1,"16":1,"13":160,"8":4298}},"2":{"d":{"109":0,"103":92,"102":54,"101":53,"100":4301,"94":0,"50":4278,"49":328,"46":1,"45":0,"44":1,"43":0,"42":0,"26":0,"23":0,"22":0,"18":0,"17":1,"16":1,"8":4300},"m":{"94":1,"100":1,"26":1,"50":1,"8":1,"49":1,"18":1,"43":1,"42":1,"109":1},"c":{"\/":{"d":{"109":0,"100":4301,"94":0,"50":4278,"49":328,"43":0,"42":0,"26":0,"18":0,"8":4300}},"G":{"d":{"109":1,"100":4303,"94":1,"68":17,"50":64,"49":53,"43":1,"42":1,"34":0,"18":1,"13":2216,"11":0,"8":4302}}}},"3": The }}}} is suspicious but this probably just closes 4 nested object literals. Would appreciate any insight.

Read the article

SOA Suite 11g Native Format Builder Complex Format Example

- by bob.webster

This rather long posting details the steps required to process a grouping of fixed length records using Format Builder. If it’s 10 pm and you’re feeling beat you might want to leave this until tomorrow. But if it’s 10 pm and you need to get a Format Builder Complex template done, read on… The goal is to process individual orders from a file using the 11g File Adapter and Format Builder Sample Data =========== 001Square Widget 0245.98 102Triagular Widget 1120.00 403Circular Widget 0099.45 ORD8898302/01/2011 301Hexagon Widget 1150.98 ORD6735502/01/2011 The records are fixed length records representing a number of logical Order records. Each order record consists of a number of item records starting with a 3 digit number, followed by a single Summary Record which starts with the constant ORD. How can this file be processed so that the first poll returns the first order? 001Square Widget 0245.98 102Triagular Widget 1120.00 403Circular Widget 0099.45 ORD8898302/01/2011 And the second poll returns the second order? 301Hexagon Widget 1150.98 ORD6735502/01/2011 Note: if you need more than one order per poll, that’s also possible, see the “Multiple Messages” field in the “File Adapter Step 6 of 9” snapshot further down. To follow along with this example you will need - Studio Edition Version 11.1.1.4.0 with the - SOA Extension for JDeveloper 11.1.1.4.0 installed Both can be downloaded from here: http://www.oracle.com/technetwork/middleware/soasuite/downloads/index.html You will not need a running WebLogic Server domain to complete the steps and Format Builder tests in this article. Start with a SOA Composite containing a File Adapter The Format Builder is part of the File Adapter so start by creating a new SOA Project and Composite. Here is a quick summary for those not familiar with these steps - Start JDeveloper - From the Main Menu choose File->New - In the New Gallery window that opens Expand the “General” category and Select the Applications node. Then choose SOA Application from the Items section on the right. Finally press the OK button. - In Step 1 of the “Create SOA Application wizard” that appears enter an Application Name and an Directory of your choice, then press the Next button. - In Step 2 of the “Create SOA Application wizard”, press the Next button leaving all entries as defaulted. - In Step 3 of the “Create SOA Application wizard”, Enter a composite name of your choice and Press the Finish Button These steps result in a new Application and SOA Project. The SOA Project contains a composite.xml file which is opened and shown below. For our example we have not defined a Mediator or a BPEL process to minimize the steps, but one or the other would eventually be needed to use the File Adapter we are about to create. Drag and drop the File Adapter icon from the Component Pallette onto either the LEFT side of the diagram under “Exposed Services” or the right side under “External References”. (See the Green Circle in the image below). Placing the adapter on the left side would indicate the file being processed is inbound to the composite, if the adapter is placed on the right side then the data is outbound to a file. Note that the same Format Builder definition can be used in both directions. For example we could use the format with a File Adapter on the left side of the composite to parse fixed data into XML, modify the data in our Composite or BPEL process and then use the same Format Builder definition with a File adapter on the right side of the composite to write the data back out in the same fixed data format When the File Adapter is dropped on the Composite the File Adapter Wizard Appears. Skip Past the first page, Step 1 of 9 by pressing the Next button. In Step 2 enter a service name of your choice as shown below, then press Next When the Native Format Builder appears, skip the welcome page by pressing next. Also press the Next button to accept the settings on Step 3 of 9 On Step 4, select Read File and press the Next button as shown below. On Step 5 enter a directory that will contain a file with the input data, then Press the Next button as shown below. In step 6, enter *.txt or another file format to select input files from the input directory mentioned in step 5. ALSO check the “Files contain Multiple Messages” checkbox and set the “Publish Messages in Batches of” field to 1. The value can be set higher to increase the number of logical order group records returned on each poll of the file adapter. In other words, it determines the number of Orders that will be sent to each instance of a Mediator or Composite processing using the File Adapter. Skip Step 7 by pressing the Next button In Step 8 press the Gear Icon on the right side to load the Native Format Builder. Native Format Builder appears Before diving into the format, here is an overview of the process. Approach - Bottom up Assuming an Order is a grouping of item records and a summary record…. - Define a separate Complex Type for each Record Type found in the group. (One for itemRecord and one for summaryRecord) - Define a Complex Type to contain the Group of Record types defined above (LogicalOrderRecord) - Define a top level element to represent an order. (order) The order element will be of type LogicalOrderRecord Defining the Format In Step 1 select “Create new” and “Complex Type” and “Next” In Step two browse to and select a file containing the test data shown at the start of this article. A link is provided at the end of this article to download a file containing the test data. Press the Next button In Step 3 Complex types must be define for each type of input record. Select the Root-Element and Click on the Add Complex Type icon This creates a new empty complex type definition shown below. The fastest way to create the definition is to highlight the first line of the Sample File data and drag the line onto the <new_complex_type> Format Builder introspects the data and provides a grid to define additional fields. Change the “Complex Type Name” to “itemRecord” Then click on the ruler to indicate the position of fixed columns. Drag the red triangle icons to the exact columns if necessary. Double click on an existing red triangle to remove an unwanted entry. In the case below fields are define in columns 0-3, 4-28, 29-eol When the field definitions are correct, press the “Generate Fields” button. Field entries named C1, C2 and C3 will be created as shown below. Click on the field names and rename them from C1->itemNum, C2->itemDesc and C3->itemCost When all the fields are correctly defined press OK to save the complex type. Next, the process is repeated to define a Complex Type for the SummaryRecord. Select the Root-Element in the schema tree and press the new complex type icon Then highlight and drag the Summary Record from the sample data onto the <new_complex_type> Change the complex type name to “summaryRecord” Mark the fixed fields for Order Number and Order Date. Press the Generate Fields button and rename C1 and C2 to itemNum and orderDate respectively. The last complex type to be defined is a type to hold the group of items and the summary record. Select the Root-Element in the schema tree and click the new complex type icon Select the “<new_complex_type>” entry and click the pencil icon On the Complex Type Details page change the name and type of each input field. Change line 1 to be named item and set the Type to “itemRecord” Change line 2 to be named summary and set the Type to “summaryRecord” We also need to indicate that itemRecords repeat in the input file. Click the pencil icon at the right side of the item line. On the Edit Details page change the “Max Occurs” entry from 1 to UNBOUNDED. We also need to indicate how to identify an itemRecord. Since each item record has “.” in column 32 we can use this fact to differentiate an item record from a summary record. Change the “Look Ahead” field to value 32 and enter a period in the “Look For” field Press the OK button to save entry. Finally, its time to create a top level element to represent an order. Select the “Root-Element” in the schema tree and press the New element icon Click on the <new_element> and press the pencil icon. Set the Element Name to “order” and change the Data Type to “logicalOrderRecord” Press the OK button to save the element definition. The final definition should match the screenshot below. Press the Next Button to view the definition source. Press the Test Button to test the definition Press the Green Triangle Icon to run the test. And we are presented with an unwelcome error. The error states that the processor ran out of data while working through the definition. The processor was unable to differentiate between itemRecords and summaryRecords and therefore treated the entire file as a list of itemRecords. At end of file, the “summary” portion of the logicalOrderRecord remained unprocessed but mandatory. This root cause of this error is the loss of our “lookAhead” definition used to identify itemRecords. This appears to be a bug in the Native Format Builder 11.1.1.4.0 Luckily, a simple workaround exists. Press the Cancel button and return to the “Step 4 of 4” Window. Manually add nxsd:lookAhead="32" nxsd:lookFor="." attributes after the maxOccurs attribute of the item element. as shown in the highlighted text below. When the lookAhead and lookFor attributes have been added Press the Test button and on the Test page press the Green Triangle. The test is now successful, the first order in the file is returned by the File Adapter. Below is a complete listing of the Result XML from the right column of the screen above Try running it The downloaded input test file and completed schema file can be used for testing without following all the Native Format Builder steps in this example. Use the following link to download a file containing the sample data. Download Sample Input Data This is the best approach rather than cutting and pasting the input data at the top of the article. Since the data is fixed length it’s very important to watch out for trailing spaces in the data and to ensure an eol character at the end of every line. The download file is correctly formatted. The final schema definition can be downloaded at the following link Download Completed Schema Definition - Save the inputData.txt file to a known location like the xsd folder in your project. - Save the inputData_6.xsd file to the xsd folder in your project. - At step 1 in the Native Format Builder wizard (as shown above) check the “Edit existing” radio button, then browse and select the inputData_6.xsd file - At step 2 of the Format Builder configuration Wizard (as shown above) supply the path and filename for the inputData.txt file. - You can then proceed to the test page and run a test. - Remember the wizard bug will drop the lookAhead and lookFor attributes, you will need to manually add nxsd:lookAhead="32" nxsd:lookFor="." after the maxOccurs attribute of the item element in the LogicalOrderRecord Complex Type. (as shown above) Good Luck with your Format Project

Read the article

SQLAuthority News – Speaking Sessions at TechEd India – 3 Sessions – 1 Panel Discussion

- by pinaldave

Microsoft Tech-Ed India 2010 is considered as the major Technology event of the year for various IT professionals and developers. This event will feature a comprehensive forum in order to learn, connect, explore, and evolve the current technologies we have today. I would recommend this event to you since here you will learn about today’s cutting-edge trends, thereby enhancing your work profile and getting ahead of the rest. But, the most important benefit of all might be the networking opportunity that that you can attain by attending the forum. You can build personal connections with various Microsoft experts and peers that will last even far beyond this event! It also feels good to let you know that I will be speaking at this year’s event! So, here are the sessions that await you in this mega-forum. Session 1: True Lies of SQL Server – SQL Myth Buster Date: April 12, 2010 Time: 11:15pm – 11:45pm In this 30-minute demo session, I am going to briefly demonstrate few SQL Server Myth and their resolution backing up with some demo. This demo session is a must-attend for all developers and administrators who would come to the event. This is going to be a very quick yet fun session. Session 2: Master Data Services in Microsoft SQL Server 2008 R2 Date: April 12, 2010 Time: 2:30pm-3:30pm SQL Server Master Data Services will ship with SQL Server 2008 R2 and will improve Microsoft’s platform appeal. This session provides an in depth demonstration of MDS features and highlights important usage scenarios. Master Data Services enables consistent decision making by allowing you to create, manage and propagate changes from single master view of your business entities. Also with MDS – Master Data-hub which is the vital component helps ensure reporting consistency across systems and deliver faster more accurate results across the enterprise. We will talk about establishing the basis for a centralized approach to defining, deploying, and managing master data in the enterprise. Session 3: Developing with SQL Server Spatial and Deep Dive into Spatial Indexing Date: April 14, 2010 Time: 5:00pm-6:00pm Microsoft SQL Server 2008 delivers new spatial data types that enable you to consume, use, and extend location-based data through spatial-enabled applications. Attend this session to learn how to use spatial functionality in next version of SQL Server to build and optimize spatial queries. This session outlines the new geography data type to store geodetic spatial data and perform operations on it, use the new geometry data type to store planar spatial data and perform operations on it, take advantage of new spatial indexes for high performance queries, use the new spatial results tab to quickly and easily view spatial query results directly from within Management Studio, extend spatial data capabilities by building or integrating location-enabled applications through support for spatial standards and specifications and much more. Panel Discussion: Harness the power of Web – SEO and Technical Blogging Date: April 12, 2010 Time: 5:00pm-6:00pm Here you will learn lots of tricks and tips about SEO and Technical Blogging from various Industry Technical Blogging Experts. This event will surely be one of the most important Tech conventions of 2010. TechEd is going to be a very busy time for Tech developers and enthusiasts, since every evening there will be a fun session to attend. If you are interested in any of the above topics for every session, I suggest that you visit each of them as you will learn so many things about the topic to be discussed. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: MVP, Pinal Dave, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority Author Visit, SQLAuthority News, T SQL, Technology Tagged: TechEd, TechEdIn

Read the article

SQL SERVER – What is Spatial Database? – Developing with SQL Server Spatial and Deep Dive into Spati

- by pinaldave

What is Spatial Database? A spatial database is a database that is optimized to store and query data related to objects in space, including points, lines and polygons. While typical databases can understand various numeric and character types of data, additional functionality needs to be added for databases to process spatial data types. (Source: Wikipedia) Today I will be talking about the same subject at Microsoft TechEd India. If you want to learn about how to spatial aspect of data and how to integrate them with SQL Server this is the perfect session for you. Spatial is very special concept of SQL Server and I really like how it is implemented in SQL Server. In general Performance Tuning and Query Optimization is something I always have enjoyed in my professional life. Index are my best friends and many time, by implementing and many time by removing I have improved the performance of the system. In this session, I will be talking about Index along with Spatial Data. As Spatial Database is very interesting concept, I will cover super short but very interesting 10 quick slides about this subject. I will make sure in very first 20 mins, you will understand following topics Introduction to Spatial Database One line definition Understanding Spatial Indexing Index Internals Query/Performance Tuning Query Hinting/Cost Analysis Spatial Index Catalog Views Performance Troubleshooting Finding Optimal Index using Spatial Index SP Common Errors Index Maintenance This slides decks will be followed by around 30 mins demo which will have story of geometry, geography, index internals and performance tuning. If you are interested in learning how GIS works and how SQL Server out of the box supports this wonderful tools, you will really like how the story is told. I am sure all people who attend the event will know how the Bangalore is positioned on the map of India. I will take example of Bangalore and Hyderabad and demonstrate how index can improve the performance. Well there are lots of story to tell in the session, and I will be opening this session with the beautiful script of Botticelli’s Birth of Venus created by Michael J. Swart. I will also demonstrate few real life scenario where I will be talking about Spatial Database and its usage. Do not miss this session. At the end of session there will be book awarded to best participant. My session details: Session 3: Developing with SQL Server Spatial and Deep Dive into Spatial Indexing Date: April 14, 2010 Time: 5:00pm-6:00pm Microsoft SQL Server 2008 delivers new spatial data types that enable you to consume, use, and extend location-based data through spatial-enabled applications. Attend this session to learn how to use spatial functionality in next version of SQL Server to build and optimize spatial queries. This session outlines the new geography data type to store geodetic spatial data and perform operations on it, use the new geometry data type to store planar spatial data and perform operations on it, take advantage of new spatial indexes for high performance queries, use the new spatial results tab to quickly and easily view spatial query results directly from within Management Studio, extend spatial data capabilities by building or integrating location-enabled applications through support for spatial standards and specifications and much more. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL Index, SQL Optimization, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority Author Visit, T SQL, Technology Tagged: Spatial Database

Read the article

Continuous Integration for SQL Server Part II – Integration Testing

- by Ben Rees

My previous post, on setting up Continuous Integration for SQL Server databases using GitHub, Bamboo and Red Gate’s tools, covered the first two parts of a simple Database Continuous Delivery process: Putting your database in to a source control system, and, Running a continuous integration process, each time changes are checked in. However there is, of course, a lot more to to Continuous Delivery than that. Specifically, in addition to the above: Putting some actual integration tests in to the CI process (otherwise, they don’t really do much, do they!?), Deploying the database changes with a managed, automated approach, Monitoring what you’ve just put live, to make sure you haven’t broken anything. This post will detail how to set up a very simple pipeline for implementing the first of these (continuous integration testing). NB: A lot of the setup in this post is built on top of the configuration from before, so it might be difficult to implement this post without running through part I first. There’ll then be a third post on automated database deployment followed by a final post dealing with the last item – monitoring changes on the live system. In the previous post, I used a mixture of Red Gate products and other 3rd party software – GitHub and Atlassian Bamboo specifically. This was partly because I believe most people work in an heterogeneous environment, using software from different vendors to suit their purposes and I wanted to show how this could work for this process. For example, you could easily substitute Atlassian’s BitBucket or Stash for GitHub, depending on your needs, or use an alternative CI server such as TeamCity, TFS or Jenkins. However, in this, post, I’ll be mostly using Red Gate products only (other than tSQLt). I would do this, firstly because I work for Red Gate. However, I also think that in the area of Database Delivery processes, nobody else has the offerings to implement this process fully – so I didn’t have any choice! Background on Continuous Delivery For me, a great source of information on what makes a proper Continuous Delivery process is the Jez Humble and David Farley classic: Continuous Delivery – Reliable Software Releases through Build, Test, and Deployment Automation This book is not of course, primarily about databases, and the process I outline here and in the previous article is a gross simplification of what Jez and David describe (not least because it’s that much harder for databases!). However, a lot of the principles that they describe can be equally applied to database development and, I would argue, should be. As I say however, what I describe here is a very simple version of what would be required for a full production process. A couple of useful resources on handling some of these complexities can be found in the following two references: Refactoring Databases – Evolutionary Database Design, by Scott J Ambler and Pramod J. Sadalage Versioning Databases – Branching and Merging, by Scott Allen In particular, I don’t deal at all with the issues of multiple branches and merging of those branches, an issue made particularly acute by the use of GitHub. The other point worth making is that, in the words of Martin Fowler: Continuous Delivery is about keeping your application in a state where it is always able to deploy into production. I.e. we are not talking about continuously delivery updates to the production database every time someone checks in an amendment to a stored procedure. That is possible (and what Martin calls Continuous Deployment). However, again, that’s more than I describe in this article. And I doubt I need to remind DBAs or Developers to Proceed with Caution! Integration Testing Back to something practical. The next stage, building on our set up from the previous article, is to add in some integration tests to the process. As I say, the CI process, though interesting, isn’t enormously useful without some sort of test process running. For this we’ll use the tSQLt framework, an open source framework designed specifically for running SQL Server tests. tSQLt is part of Red Gate’s SQL Test found on http://www.red-gate.com/products/sql-development/sql-test/ or can be downloaded separately from www.tsqlt.org - though I’ll provide a step-by-step guide below for setting this up. Getting tSQLt set up via SQL Test Click on the link http://www.red-gate.com/products/sql-development/sql-test/ and click on the blue Download button to download the Red Gate SQL Test product, if not already installed. Follow the install process for SQL Test to install the SQL Server Management Studio (SSMS) plugin on to your machine, if not already installed. Open SSMS. You should now see SQL Test under the Tools menu: Clicking this link will give you the basic SQL Test dialogue: As yet, though we’ve installed the SQL Test product we haven’t yet installed the tSQLt test framework on to any particular database. To do this, we need to add our RedGateApp database using this dialogue, by clicking on the + Add Database to SQL Test… link, selecting the RedGateApp database and clicking the Add Database link: In the next screen, SQL Test describes what will be installed on the database for the tSQLt framework. Also in this dialogue, uncheck the “Add SQL Cop tests” option (shown below). SQL Cop is a great set of pre-defined tests that work within the tSQLt framework to check the general health of your SQL Server database. However, we won’t be using them in this particular simple example: Once you’ve clicked on the OK button, the changes described in the dialogue will be made to your database. Some of these are shown in the left-hand-side below: We’ve now installed the framework. However, we haven’t actually created any tests, so this will be the next step. But, before we proceed, we’ve made an update to our database so should, again check this in to source control, adding comments as required: Also worth a quick check that your build still runs with the new additions!: (And a quick check of the RedGateAppCI database shows that the changes have been made). Creating and Testing a Unit Test There are, of course, a lot of very interesting unit tests that you could and should set up for a database. The great thing about the tSQLt framework is that you can write these in SQL. The example I’m going to use here is pretty Mickey Mouse – our database table is going to include some email addresses as reference data and I want to check whether these are all in a correct email format. Nothing clever but it illustrates the process and hopefully shows the method by which more interesting tests could be set up. Adding Reference Data to our Database To start, I want to add some reference data to my database, and have this source controlled (as well as the schema). First of all I need to add some data in to my solitary table – this can be done a number of ways, but I’ll do this in SSMS for simplicity: I then add some reference data to my table: Currently this reference data just exists in the database. For proper integration testing, this needs to form part of the source-controlled version of the database – and so needs to be added to the Git repository. This can be done via SQL Source Control, though first a Primary Key needs to be added to the table. Right click the table, select Design, then right-click on the first “id” row. Then click on “Set Primary Key”: NB: once this change is made, click Save to save the change to the table. Then, to source control this reference data, right click on the table (dbo.Email) and selecting the following option: In the next screen, link the data in the Email table, by selecting it from the list and clicking “save and close”: We should at this point re-commit the changes (both the addition of the Primary Key, and the data) to the Git repo. NB: From here on, I won’t show screenshots for the GitHub side of things – it’s the same each time: whenever a change is made in SQL Source Control and committed to your local folder, you then need to sync this in the GitHub Windows client (as this is where the build server, Bamboo is taking it from). An interesting point to note here, when these changes are committed in SQL Source Control (right-click database and select “Commit Changes to Source Control..”): The display gives a warning about possibly needing a migration script for the “Add Primary Key” step of the changes. This isn’t actually necessary in this case, but this mechanism would allow you to create override scripts to replace the default change scripts created by the SQL Compare engine (which runs underneath SQL Source Control). Ignoring this message (!), we add a comment and commit the changes to Git. I then sync these, run a build (or the build gets run automatically), and check that the data is being deployed over to the target RedGateAppCI database: Creating and Running the Test As I mention, the test I’m going to use here is a very simple one - are the email addresses in my reference table valid? This isn’t of course, a full test of email validation (I expect the email addresses I’ve chosen here aren’t really the those of the Fab Four) – but just a very basic check of format used. I’ve taken the relevant SQL from this Stack Overflow article. In SSMS select “SQL Test” from the Tools menu, then click on + New Test: In the next screen, give your new test a name, and also enter a name in the Test Class box (test classes are schemas that help you keep things organised). Also check that the database in which the test is going to be created is correct – RedGateApp in this example: Click “Create Test”. After closing a couple of subsequent dialogues, you’ll see a dummy script for the test, that needs filling in: We now need to define the SQL for our test. As mentioned before, tSQLt allows you to write your unit tests in T-SQL, and the code I’m going to use here is as below. This needs to be copied and pasted in to the query window, to replace the default given by tSQLt: – Basic email check test ALTER PROCEDURE [MyChecks].[test Check Email Addresses] AS BEGIN SET NOCOUNT ON Declare @Output VarChar(max) Set @Output = ” SELECT @Output = @Output + Email +Char(13) + Char(10) FROM dbo.Email WHERE email NOT LIKE ‘%_@__%.__%’ If @Output > ” Begin Set @Output = Char(13) + Char(10) + @Output EXEC tSQLt.Fail@Output End END; Once this script is entered, hit execute to add the Stored Procedure to the database. Before committing the test to source control, it’s worth just checking that it works! For a positive test, click on “SQL Test” from the Tools menu, then click Run Tests. You should see output like the following: - a green tick to indicate success! But of course, what we also need to do is test that this is actually doing something by showing a failed test. Edit one of the email addresses in your table to an incorrect format: Now, re-run the same SQL Test as before and you’ll see the following: Great – we now know that our test is really doing something! You’ll also see a useful error message at the bottom of SSMS: (leave the email address as invalid for now, for the next steps). The next stage is to check this new test in to source control again, by right-clicking on the database and checking in the changes with a commit message (and not forgetting to sync in the GitHub client): Checking that the Tests are Running as Integration Tests After the changes above are made, and after a build has run on Bamboo (manual or automatic), looking at the Stored Procedures for the RedGateAppCI, the SPROC for the new test has been moved over to the database. However this is not exactly what we were after. We didn’t want to just copy objects from one database to another, but actually run the tests as part of the build/integration test process. I.e. we’re continuously checking any changes we make (in this case, to the reference data emails), to ensure we’re not breaking a test that we’ve set up. The behaviour we want to see is that, if we check in static data that is incorrect (as we did in step 9 above) and we have the tSQLt test set up, then our build in Bamboo should fail. However, re-running the build shows the following: - sadly, a successful build! To make sure the tSQLt tests are run as part of the integration test, we need to amend a switch in the Red Gate CI config file. First, navigate to file sqlCI.targets in your working folder: Edit this document, make the following change, save the document, then commit and sync this change in the GitHub client:    <enableTsqlt>true</enableTsqlt> Now, if we re-run the build in Bamboo (NB: I’ve moved to a new server here, hence different address and build number): - superb, a broken build!! The error message isn’t great here, so to get more detailed info, click on the full build log link on this page (below the fold). The interesting part of the log shown is towards the bottom. Pulling out this part: 21-Jun-2013 11:35:19 Build FAILED. 21-Jun-2013 11:35:19 21-Jun-2013 11:35:19 "C:\Users\Administrator\bamboo-home\xml-data\build-dir\RGA-RGP-JOB1\sqlCI.proj" (default target) (1) -> 21-Jun-2013 11:35:19 (sqlCI target) -> 21-Jun-2013 11:35:19 EXEC : sqlCI error occurred: RedGate.Deploy.SqlServerDbPackage.Shared.Exceptions.InvalidSqlException: Test Case Summary: 1 test case(s) executed, 0 succeeded, 1 failed, 0 errored. [C:\Users\Administrator\bamboo-home\xml-data\build-dir\RGA-RGP-JOB1\sqlCI.proj] 21-Jun-2013 11:35:19 EXEC : sqlCI error occurred: [MyChecks].[test Check Email Addresses] failed: [C:\Users\Administrator\bamboo-home\xml-data\build-dir\RGA-RGP-JOB1\sqlCI.proj] 21-Jun-2013 11:35:19 EXEC : sqlCI error occurred: ringo.starr@beatles [C:\Users\Administrator\bamboo-home\xml-data\build-dir\RGA-RGP-JOB1\sqlCI.proj] 21-Jun-2013 11:35:19 EXEC : sqlCI error occurred: [C:\Users\Administrator\bamboo-home\xml-data\build-dir\RGA-RGP-JOB1\sqlCI.proj] 21-Jun-2013 11:35:19 EXEC : sqlCI error occurred: +----------------------+ [C:\Users\Administrator\bamboo-home\xml-data\build-dir\RGA-RGP-JOB1\sqlCI.proj] 21-Jun-2013 11:35:19 EXEC : sqlCI error occurred: |Test Execution Summary| [C:\Users\Administrator\bamboo-home\xml-data\build-dir\RGA-RGP-JOB1\sqlCI.proj] As a final check, we should make sure that, if we now fix this error, the build succeeds. So in SSMS, I’m going to correct the invalid email address, then check this change in to SQL Source Control (with a comment), commit to GitHub, and re-run the build: This should have fixed the build: It worked! Summary This has been a very quick run through the implementation of CI for databases, including tSQLt tests to test whether your database updates are working. The next post in this series will focus on automated deployment – we’ve tested our database changes, how can we now deploy these to target sites?

Read the article

Flow-Design Cheat Sheet – Part I, Notation

- by Ralf Westphal

You want to avoid the pitfalls of object oriented design? Then this is the right place to start. Use Flow-Oriented Analysis (FOA) and –Design (FOD or just FD for Flow-Design) to understand a problem domain and design a software solution. Flow-Orientation as described here is related to Flow-Based Programming, Event-Based Programming, Business Process Modelling, and even Event-Driven Architectures. But even though “thinking in flows” is not new, I found it helpful to deviate from those precursors for several reasons. Some aim at too big systems for the average programmer, some are concerned with only asynchronous processing, some are even not very much concerned with programming at all. What I was looking for was a design method to help in software projects of any size, be they large or tiny, involing synchronous or asynchronous processing, being local or distributed, running on the web or on the desktop or on a smartphone. That´s why I took ideas from all of the above sources and some additional and came up with Event-Based Components which later got repositioned and renamed to Flow-Design. In the meantime this has generated some discussion (in the German developer community) and several teams have started to work with Flow-Design. Also I´ve conducted quite some trainings using Flow-Orientation for design. The results are very promising. Developers find it much easier to design software using Flow-Orientation than OOAD-based object orientation. Since Flow-Orientation is moving fast and is not covered completely by a single source like a book, demand has increased for at least an overview of the current state of its notation. This page is trying to answer this demand by briefly introducing/describing every notational element as well as their translation into C# source code. Take this as a cheat sheet to put next to your whiteboard when designing software. However, please do not expect any explanation as to the reasons behind Flow-Design elements. Details on why Flow-Design at all and why in this specific way you´ll find in the literature covering the topic. Here´s a resource page on Flow-Design/Event-Based Components, if you´re able to read German. Notation Connected Functional Units The basic element of any FOD are functional units (FU): Think of FUs as some kind of software code block processing data. For the moment forget about classes, methods, “components”, assemblies or whatever. See a FU as an abstract piece of code. Software then consists of just collaborating FUs. I´m using circles/ellipses to draw FUs. But if you like, use rectangles. Whatever suites your whiteboard needs best. The purpose of FUs is to process input and produce output. FUs are transformational. However, FUs are not called and do not call other FUs. There is no dependency between FUs. Data just flows into a FU (input) and out of it (output). From where and where to is of no concern to a FU. This way FUs can be concatenated in arbitrary ways: Each FU can accept input from many sources and produce output for many sinks: Flows Connected FUs form a flow with a start and an end. Data is entering a flow at a source, and it´s leaving it through a sink. Think of sources and sinks as special FUs which conntect wires to the environment of a network of FUs. Wiring Details Data is flowing into/out of FUs through wires. This is to allude to electrical engineering which since long has been working with composable parts. Wires are attached to FUs usings pins. They are the entry/exit points for the data flowing along the wires. Input-/output pins currently need not be drawn explicitly. This is to keep designing on a whiteboard simple and quick. Data flowing is of some type, so wires have a type attached to them. And pins have names. If there is only one input pin and output pin on a FU, though, you don´t need to mention them. The default is Process for a single input pin, and Result for a single output pin. But you´re free to give even single pins different names. There is a shortcut in use to address a certain pin on a destination FU: The type of the wire is put in parantheses for two reasons. 1. This way a “no-type” wire can be easily denoted, 2. this is a natural way to describe tuples of data. To describe how much data is flowing, a star can be put next to the wire type: Nesting – Boards and Parts If more than 5 to 10 FUs need to be put in a flow a FD starts to become hard to understand. To keep diagrams clutter free they can be nested. You can turn any FU into a flow: This leads to Flow-Designs with different levels of abstraction. A in the above illustration is a high level functional unit, A.1 and A.2 are lower level functional units. One of the purposes of Flow-Design is to be able to describe systems on different levels of abstraction and thus make it easier to understand them. Humans use abstraction/decomposition to get a grip on complexity. Flow-Design strives to support this and make levels of abstraction first class citizens for programming. You can read the above illustration like this: Functional units A.1 and A.2 detail what A is supposed to do. The whole of A´s responsibility is decomposed into smaller responsibilities A.1 and A.2. FU A thus does not do anything itself anymore! All A is responsible for is actually accomplished by the collaboration between A.1 and A.2. Since A now is not doing anything anymore except containing A.1 and A.2 functional units are devided into two categories: boards and parts. Boards are just containing other functional units; their sole responsibility is to wire them up. A is a board. Boards thus depend on the functional units nested within them. This dependency is not of a functional nature, though. Boards are not dependent on services provided by nested functional units. They are just concerned with their interface to be able to plug them together. Parts are the workhorses of flows. They contain the real domain logic. They actually transform input into output. However, they do not depend on other functional units. Please note the usage of source and sink in boards. They correspond to input-pins and output-pins of the board. Implicit Dependencies Nesting functional units leads to a dependency tree. Boards depend on nested functional units, they are the inner nodes of the tree. Parts are independent, they are the leafs: Even though dependencies are the bane of software development, Flow-Design does not usually draw these dependencies. They are implicitly created by visually nesting functional units. And they are harmless. Boards are so simple in their functionality, they are little affected by changes in functional units they are depending on. But functional units are implicitly dependent on more than nested functional units. They are also dependent on the data types of the wires attached to them: This is also natural and thus does not need to be made explicit. And it pertains mainly to parts being dependent. Since boards don´t do anything with regard to a problem domain, they don´t care much about data types. Their infrastructural purpose just needs types of input/output-pins to match. Explicit Dependencies You could say, Flow-Orientation is about tackling complexity at its root cause: that´s dependencies. “Natural” dependencies are depicted naturally, i.e. implicitly. And whereever possible dependencies are not even created. Functional units don´t know their collaborators within a flow. This is core to Flow-Orientation. That makes for high composability of functional units. A part is as independent of other functional units as a motor is from the rest of the car. And a board is as dependend on nested functional units as a motor is on a spark plug or a crank shaft. With Flow-Design software development moves closer to how hardware is constructed. Implicit dependencies are not enough, though. Sometimes explicit dependencies make designs easier – as counterintuitive this might sound. So FD notation needs a ways to denote explicit dependencies: Data flows along wires. But data does not flow along dependency relations. Instead dependency relations represent service calls. Functional unit C is depending on/calling services on functional unit S. If you want to be more specific, name the services next to the dependency relation: Although you should try to stay clear of explicit dependencies, they are fundamentally ok. See them as a way to add another dimension to a flow. Usually the functionality of the independent FU (“Customer repository” above) is orthogonal to the domain of the flow it is referenced by. If you like emphasize this by using different shapes for dependent and independent FUs like above. Such dependencies can be used to link in resources like databases or shared in-memory state. FUs can not only produce output but also can have side effects. A common pattern for using such explizit dependencies is to hook a GUI into a flow as the source and/or the sink of data: Which can be shortened to: Treat FUs others depend on as boards (with a special non-FD API the dependent part is connected to), but do not embed them in a flow in the diagram they are depended upon. Attributes of Functional Units Creation and usage of functional units can be modified with attributes. So far the following have shown to be helpful: Singleton: FUs are by default multitons. FUs in the same of different flows with the same name refer to the same functionality, but to different instances. Think of functional units as objects that get instanciated anew whereever they appear in a design. Sometimes though it´s helpful to reuse the same instance of a functional unit; this is always due to valuable state it holds. Signify this by annotating the FU with a “(S)”. Multiton: FUs on which others depend are singletons by default. This is, because they usually are introduced where shared state comes into play. If you want to change them to be a singletons mark them with a “(M)”. Configurable: Some parts need to be configured before the can do they work in a flow. Annotate them with a “(C)” to have them initialized before any data items to be processed by them arrive. Do not assume any order in which FUs are configured. How such configuration is happening is an implementation detail. Entry point: In each design there needs to be a single part where “it all starts”. That´s the entry point for all processing. It´s like Program.Main() in C# programs. Mark the entry point part with an “(E)”. Quite often this will be the GUI part. How the entry point is started is an implementation detail. Just consider it the first FU to start do its job. Patterns / Standard Parts If more than a single wire is attached to an output-pin that´s called a split (or fork). The same data is flowing on all of the wires. Remember: Flow-Designs are synchronous by default. So a split does not mean data is processed in parallel afterwards. Processing still happens synchronously and thus one branch after another. Do not assume any specific order of the processing on the different branches after the split. It is common to do a split and let only parts of the original data flow on through the branches. This effectively means a map is needed after a split. This map can be implicit or explicit. Although FUs can have multiple input-pins it is preferrable in most cases to combine input data from different branches using an explicit join: The default output of a join is a tuple of its input values. The default behavior of a join is to output a value whenever a new input is received. However, to produce its first output a join needs an input for all its input-pins. Other join behaviors can be: reset all inputs after an output only produce output if data arrives on certain input-pins

Read the article

Database Mirroring on SQL Server Express Edition

- by Most Valuable Yak (Rob Volk)

Like most SQL Server users I'm rather frustrated by Microsoft's insistence on making the really cool features only available in Enterprise Edition. And it really doesn't help that they changed the licensing for SQL 2012 to be core-based, so now it's like 4 times as expensive! It almost makes you want to go with Oracle. That, and a desire to have Larry Ellison do things to your orifices. And since they've introduced Availability Groups, and marked database mirroring as deprecated, you'd think they'd make make mirroring available in all editions. Alas…they don't…officially anyway. Thanks to my constant poking around in places I'm not "supposed" to, I've discovered the low-level code that implements database mirroring, and found that it's available in all editions! It turns out that the query processor in all SQL Server editions prepends a simple check before every edition-specific DDL statement: IF CAST(SERVERPROPERTY('Edition') as nvarchar(max)) NOT LIKE '%e%e%e% Edition%' print 'Lame' else print 'Cool' If that statement returns true, it fails. (the print statements are just placeholders) Go ahead and test it on Standard, Workgroup, and Express editions compared to an Enterprise or Developer edition instance (which support everything). Once again thanks to Argenis Fernandez (b | t) and his awesome sessions on using Sysinternals, I was able to watch the exact process SQL Server performs when setting up a mirror. Surprisingly, it's not actually implemented in SQL Server! Some of it is, but that's something of a smokescreen, the real meat of it is simple filesystem primitives. The NTFS filesystem supports links, both hard links and symbolic, so that you can create two entries for the same file in different directories and/or different names. You can create them using the MKLINK command in a command prompt: mklink /D D:\SkyDrive\Data D:\Data mklink /D D:\SkyDrive\Log D:\Log This creates a symbolic link from my data and log folders to my Skydrive folder. Any file saved in either location will instantly appear in the other. And since my Skydrive will be automatically synchronized with the cloud, any changes I make will be copied instantly (depending on my internet bandwidth of course). So what does this have to do with database mirroring? Well, it seems that the mirroring endpoint that you have to create between mirror and principal servers is really nothing more than a Skydrive link. Although it doesn't actually use Skydrive, it performs the same function. So in effect, the following statement: ALTER DATABASE Mir SET PARTNER='TCP://MyOtherServer.domain.com:5022' Is turned into: mklink /D "D:\Data" "\\MyOtherServer.domain.com\5022$" The 5022$ "port" is actually a hidden system directory on the principal and mirror servers. I haven't quite figured out how the log files are included in this, or why you have to SET PARTNER on both principal and mirror servers, except maybe that mklink has to do something special when linking across servers. I couldn't get the above statement to work correctly, but found that doing mklink to a local Skydrive folder gave me similar functionality. To wrap this up, all you have to do is the following: Install Skydrive on both SQL Servers (principal and mirror) and set the local Skydrive folder (D:\SkyDrive in these examples) On the principal server, run mklink /D on the data and log folders to point to SkyDrive: mklink /D D:\SkyDrive\Data D:\Data On the mirror server, run the complementary linking: mklink /D D:\Data D:\SkyDrive\Data Create your database and make sure the files map to the principal data and log folders (D:\Data and D:\Log) Viola! Your databases are kept in sync on multiple servers! One wrinkle you will encounter is that the mirror server will show the data and log files, but you won't be able to attach them to the mirror SQL instance while they are attached to the principal. I think this is a bug in the Skydrive, but as it turns out that's fine: you can't access a mirror while it's hosted on the principal either. So you don't quite get automatic failover, but you can attach the files to the mirror if the principal goes offline. It's also not exactly synchronous, but it's better than nothing, and easier than either replication or log shipping with a lot less latency. I will end this with the obvious "not supported by Microsoft" and "Don't do this in production without an updated resume" spiel that you should by now assume with every one of my blog posts, especially considering the date.

Read the article

Problem with AssetManager while loading a Model type

- by user1204548

Today I've tried the AssetManager for the first time with .g3db files and I'm having some problems. Exception in thread "LWJGL Application" com.badlogic.gdx.utils.GdxRuntimeException: com.badlogic.gdx.utils.GdxRuntimeException: Couldn't load dependencies of asset: data/data at com.badlogic.gdx.assets.AssetManager.handleTaskError(AssetManager.java:508) at com.badlogic.gdx.assets.AssetManager.update(AssetManager.java:342) at com.lostchg.martagdx3d.MartaGame.render(MartaGame.java:78) at com.badlogic.gdx.Game.render(Game.java:46) at com.badlogic.gdx.backends.lwjgl.LwjglApplication.mainLoop(LwjglApplication.java:207) at com.badlogic.gdx.backends.lwjgl.LwjglApplication$1.run(LwjglApplication.java:114) Caused by: com.badlogic.gdx.utils.GdxRuntimeException: Couldn't load dependencies of asset: data/data at com.badlogic.gdx.assets.AssetLoadingTask.handleAsyncLoader(AssetLoadingTask.java:119) at com.badlogic.gdx.assets.AssetLoadingTask.update(AssetLoadingTask.java:89) at com.badlogic.gdx.assets.AssetManager.updateTask(AssetManager.java:445) at com.badlogic.gdx.assets.AssetManager.update(AssetManager.java:340) ... 4 more Caused by: com.badlogic.gdx.utils.GdxRuntimeException: com.badlogic.gdx.utils.GdxRuntimeException: Couldn't load file: data/data at com.badlogic.gdx.utils.async.AsyncResult.get(AsyncResult.java:31) at com.badlogic.gdx.assets.AssetLoadingTask.handleAsyncLoader(AssetLoadingTask.java:117) ... 7 more Caused by: com.badlogic.gdx.utils.GdxRuntimeException: Couldn't load file: data/data at com.badlogic.gdx.graphics.Pixmap.<init>(Pixmap.java:140) at com.badlogic.gdx.assets.loaders.TextureLoader.loadAsync(TextureLoader.java:72) at com.badlogic.gdx.assets.loaders.TextureLoader.loadAsync(TextureLoader.java:41) at com.badlogic.gdx.assets.AssetLoadingTask.call(AssetLoadingTask.java:69) at com.badlogic.gdx.assets.AssetLoadingTask.call(AssetLoadingTask.java:34) at com.badlogic.gdx.utils.async.AsyncExecutor$2.call(AsyncExecutor.java:49) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: com.badlogic.gdx.utils.GdxRuntimeException: File not found: data\data (Internal) at com.badlogic.gdx.files.FileHandle.read(FileHandle.java:132) at com.badlogic.gdx.files.FileHandle.length(FileHandle.java:586) at com.badlogic.gdx.files.FileHandle.readBytes(FileHandle.java:220) at com.badlogic.gdx.graphics.Pixmap.<init>(Pixmap.java:137) ... 9 more Why it tries to load that unexisting file? It seems that the AssetManager manages to load my .g3db file at first, because earlier the java console threw some errors related to the textures associated to the 3D scene having to be a power of 2. Relevant code: public void show() { ... assets = new AssetManager(); assets.load("data/levelprueba2.g3db", Model.class); loading = true; ... } private void doneLoading() { Model model = assets.get("data/levelprueba2.g3db", Model.class); for (int i = 0; i < model.nodes.size; i++) { String id = model.nodes.get(i).id; ModelInstance instance = new ModelInstance(model, id); Node node = instance.getNode(id); instance.transform.set(node.globalTransform); node.translation.set(0,0,0); node.scale.set(1,1,1); node.rotation.idt(); instance.calculateTransforms(); instances.add(instance); } loading = false; } public void render(float delta) { super.render(delta); if (loading && assets.update()) doneLoading(); ... } The error points to the line with the assets.update() method. Please, help! Sorry for my bad English and my amateurish doubts.

Read the article

Introducing MySQL for Excel

- by Javier Treviño

As part of the new product initiatives of the MySQL on Windows group we released a tool that makes the task of getting data in and out of a MySQL Database very friendly and intuitive, and we paired it with one of the preferred applications for data analysis and manipulation in Windows platforms, MS Excel. Welcome to MySQL for Excel, an add-in that is installed and accessed from within the MS Excel’s Data tab offering a wizard-like interface arranged in an elegant yet simple way to help users browse MySQL Schemas, Tables, Views and Procedures and perform data operations against them using MS Excel as the vehicle to drive the data in and out MySQL Databases. One of the coolest features we had in mind designing MySQL for Excel is simplicity. MS Excel is simple and easy to work with, thus liked by many Windows users because they don’t have to be software gurus to use it. We applied the same principle by targeting MySQL for Excel to any kind of user, so if you are already familiarized with Excel’s interface you will find yourself working with MySQL data in no time. MySQL for Excel is shipped within the MySQL Installer as one of the tools in the suite; if prerequisites are already installed (.NET Framework 4.0, Visual Studio Tools for Office 4.0 and of course MS Office), installing the add-in involves a very few clicks and no further setup to use it. Being an Excel Add-In there is no executable file involved after the installation, running MS Excel and opening the add-in from its Data tab is all that is required. MySQL for Excel automatically integrates with MySQL Workbench (if installed) to share the same connections to MySQL Server installations, that way connections are defined just once in either product saving time. Opening the Add-In brings the Welcome Panel at the right side of the Excel main window from which connections to MySQL Servers are shown grouped by Local VS Remote connections; then users can open any of those connections by double-clicking it and entering the password of the used account. Additionally a user can create a connection by clicking on the New Connection action label or edit connections through MySQL Workbench (if installed) by clicking on the Manage Connections action label. Once a connection is opened, the Schema Selection panel is shown, at the top of it the selected connection (connection name, hostname/IP and username). Just below, a list of schemas is displayed where User Schemas are grouped first followed by System Schemas; users can double-click any selected schema to go to the next panel or select a schema and clicking the Next > button. Users can alternatively click on the < Back button to go back to the Welcome Panel to close the current connection and open a new one; also by clicking the Create New Schema action label they can create an empty new schema. Once a schema is opened the DB Object Selection panel is shown, this is actually the place where the fun stuff happens; from here users are able to perform actions against MySQL Tables, Views and Procedures. ">The actions available here are about importing data from a MySQL Table, View or Procedure to Excel, exporting Excel data to a new MySQL Table, appending Excel data to an existing MySQL Table or editing a MySQL Table’s data by using an Excel Worksheet as a user interface to update data in any row/column, insert new rows or delete existing rows in a very easy and friendly way. More blog posts will follow describing all of these actions, so stay tuned! Remember that your feedback is very important for us, so drop us a message: · MySQL on Windows (this) Blog - https://blogs.oracle.com/MySqlOnWindows/ · Forum - http://forums.mysql.com/list.php?172 · Facebook - http://www.facebook.com/mysql Cheers!

Read the article

Windows Azure Recipe: Consumer Portal

- by Clint Edmonson

Nearly every company on the internet has a web presence. Many are merely using theirs for informational purposes. More sophisticated portals allow customers to register their contact information and provide some level of interaction or customer support. But as our understanding of how consumers use the web increases, the more progressive companies are taking advantage of social web and rich media delivery to connect at a deeper level with the consumers of their goods and services. Drivers Cost reduction Scalability Global distribution Time to market Solution Here’s a sketch of how a Windows Azure Consumer Portal might be built out: Ingredients Web Role – this will host the core of the solution. Each web role is a virtual machine hosting an application written in ASP.NET (or optionally php, or node.js). The number of web roles can be scaled up or down as needed to handle peak and non-peak traffic loads. Database – every modern web application needs to store data. SQL Azure databases look and act exactly like their on-premise siblings but are fault tolerant and have data redundancy built in. Access Control (optional) – if identity needs to be tracked within the solution, the access control service combined with the Windows Identity Foundation framework provides out-of-the-box support for several social media platforms including Windows LiveID, Google, Yahoo!, Facebook. It also has a provider model to allow integration with other platforms as well. Caching (optional) – for sites with high traffic with lots of read-only data and lists, the distributed in-memory caching service can be used to cache and serve up static data at higher scale and speed than direct database requests. It can also be used to manage user session state. Blob Storage (optional) – for sites that serve up unstructured data such as documents, video, audio, device drivers, and more. The data is highly available and stored redundantly across data centers. Each entry in blob storage is provided with it’s own unique URL for direct access by the browser. Content Delivery Network (CDN) (optional) – for sites that service users around the globe, the CDN is an extension to blob storage that, when enabled, will automatically cache frequently accessed blobs and static site content at edge data centers around the world. The data can be delivered statically or streamed in the case of rich media content. Training Labs These links point to online Windows Azure training labs where you can learn more about the individual ingredients described above. (Note: The entire Windows Azure Training Kit can also be downloaded for offline use.) Windows Azure (16 labs) Windows Azure is an internet-scale cloud computing and services platform hosted in Microsoft data centers, which provides an operating system and a set of developer services which can be used individually or together. It gives developers the choice to build web applications; applications running on connected devices, PCs, or servers; or hybrid solutions offering the best of both worlds. New or enhanced applications can be built using existing skills with the Visual Studio development environment and the .NET Framework. With its standards-based and interoperable approach, the services platform supports multiple internet protocols, including HTTP, REST, SOAP, and plain XML SQL Azure (7 labs) Microsoft SQL Azure delivers on the Microsoft Data Platform vision of extending the SQL Server capabilities to the cloud as web-based services, enabling you to store structured, semi-structured, and unstructured data. Windows Azure Services (9 labs) As applications collaborate across organizational boundaries, ensuring secure transactions across disparate security domains is crucial but difficult to implement. Windows Azure Services provides hosted authentication and access control using powerful, secure, standards-based infrastructure. See my Windows Azure Resource Guide for more guidance on how to get started, including links web portals, training kits, samples, and blogs related to Windows Azure.

Read the article

Analytics in an Omni-Channel World

- by David Dorf

Retail has been around ever since mankind started bartering. The earliest transactions were very specific to the individuals buying and selling, then someone had the bright idea to open a store. Those transactions were a little more generic, but the store owner still knew his customers and what they wanted. As the chains rolled out, customer intimacy was sacrificed for scale, and retailers began to rely on segments and clusters. But thanks to the widespread availability of data and the technology to convert said data into information, retailers are getting back to details. The retail industry is following a maturity model for analytics that is has progressed through five stages, each delivering more value than the previous. Store Analytics Brick-and-mortar retailers (and pure-play catalogers as well) that collect anonymous basket-level data are able to get some sense of demand to help with allocation decisions. Promotions and foot-traffic can be measured to understand marketing effectiveness and perhaps focus groups can help test ideas. But decisions are influenced by the majority, using faceless customer segments and aggregated industry data points. Loyalty programs help a little, but in many cases the cost outweighs the benefits. Web Analytics The Web made it much easier to collect data on specific, yet still anonymous consumers using cookies to track visits. Clickstreams and product searches are analyzed to understand the purchase journey, gauge demand, and better understand up-selling opportunities. Personalization begins to allow retailers target market consumers with recommendations. Cross-Channel Analytics This phase is a minor one, but where most retailers probably sit today. They are able to use information from one channel to bolster activities in another. However, there are technical challenges combining data silos so its not an easy task. But for those retailers that are able to perform analytics on both sources of data, the pay-off is pretty nice. Revenue per customer begins to go up as customers have a better brand experience. Mobile & Social Analytics Big data technologies are enabling a 360-degree view of the customer by incorporating psychographic data from social sites alongside traditional demographic data. Retailers can track individual preferences, opinions, hobbies, etc. in order to understand a consumer's motivations. Using mobile devices, consumers can interact with brands anywhere, anytime, accessing deep product information and reviews. Mobile, combined with a loyalty program, presents an opportunity to put shopping into geographic context, understanding paths to the store, patterns within the store, and be an always-on advertising conduit. Omni-Channel Analytics All this data along with the proper technology represents a new paradigm in which the clock is turned back and retail becomes very personal once again. Rich, individualized data better illuminates demand, allows for highly localized assortments, and helps tailor up-selling. Interactions with all channels help build an accurate profile of each consumer, and allows retailers to tailor the retail experience to meet the heightened expectations of today's sophisticated shopper. And of course this culminates in greater customer satisfaction and business profitability.

Read the article

How to remove synaptic without installing all the unwanted packages?

- by Jay

I am trying to uninstall synaptic. I prefer using apt-get and other command line tools to manage my packages. So I do not need synaptic and the software manager. I'm trying to remove both of them using apt-get. Its a new box. Recently installed Linux Mint mate 15. After installation, the only thing I did was, sudo apt-get update and sudo apt-get dist-upgrade After that, I did this command for removing synaptic, sudo apt-get remove --purge synaptic But this gives me a very weird output, Reading package lists... Done Building dependency tree Reading state information... Done The following packages were automatically installed and are no longer required: apturl-kde icoutils kate-data katepart kde-runtime kde-runtime-data kdelibs-bin kdelibs5-data kdelibs5-plugins kdesudo kdoctools kubuntu-debug-installer libattica0.4 libdlrestrictions1 libkactivities-bin libkactivities-models1 libkactivities6 libkatepartinterfaces4 libkcmutils4 libkde3support4 libkdeclarative5 libkdecore5 libkdesu5 libkdeui5 libkdewebkit5 libkdnssd4 libkemoticons4 libkfile4 libkhtml5 libkidletime4 libkio5 libkjsapi4 libkjsembed4 libkmediaplayer4 libknewstuff3-4 libknotifyconfig4 libkntlm4 libkparts4 libkpty4 libkrosscore4 libktexteditor4 libkxmlrpcclient4 libnepomuk4 libnepomukcore4abi1 libnepomukquery4a libnepomukutils4 libntrack-qt4-1 libntrack0 libphonon4 libplasma3 libpolkit-qt-1-1 libpoppler-qt4-4 libqapt2 libqapt2-runtime libqca2 libqt4-qt3support libsolid4 libsoprano4 libstreamanalyzer0 libstreams0 libthreadweaver4 libvirtodbc0 nepomuk-core nepomuk-core-data ntrack-module-libnl-0 odbcinst odbcinst1debian2 oxygen-icon-theme phonon phonon-backend-gstreamer plasma-scriptengine-javascript qapt-batch shared-desktop-ontologies soprano-daemon virtuoso-minimal virtuoso-opensource-6.1-bin virtuoso-opensource-6.1-common Use 'apt-get autoremove' to remove them. The following extra packages will be installed: apturl-kde icoutils kate-data katepart kde-runtime kde-runtime-data kdelibs-bin kdelibs5-data kdelibs5-plugins kdesudo kdoctools kubuntu-debug-installer libattica0.4 libdlrestrictions1 libkactivities-bin libkactivities-models1 libkactivities6 libkatepartinterfaces4 libkcmutils4 libkde3support4 libkdeclarative5 libkdecore5 libkdesu5 libkdeui5 libkdewebkit5 libkdnssd4 libkemoticons4 libkfile4 libkhtml5 libkidletime4 libkio5 libkjsapi4 libkjsembed4 libkmediaplayer4 libknewstuff3-4 libknotifyconfig4 libkntlm4 libkparts4 libkpty4 libkrosscore4 libktexteditor4 libkxmlrpcclient4 libnepomuk4 libnepomukcore4abi1 libnepomukquery4a libnepomukutils4 libntrack-qt4-1 libntrack0 libphonon4 libplasma3 libpolkit-qt-1-1 libpoppler-qt4-4 libqapt2 libqapt2-runtime libqca2 libqt4-qt3support libsolid4 libsoprano4 libstreamanalyzer0 libstreams0 libthreadweaver4 libvirtodbc0 libxml2-utils nepomuk-core nepomuk-core-data ntrack-module-libnl-0 odbcinst odbcinst1debian2 oxygen-icon-theme phonon phonon-backend-gstreamer plasma-scriptengine-javascript qapt-batch shared-desktop-ontologies soprano-daemon virtuoso-minimal virtuoso-opensource-6.1-bin virtuoso-opensource-6.1-common Suggested packages: libterm-readline-gnu-perl libterm-readline-perl-perl djvulibre-bin finger hspell libqca2-plugin-cyrus-sasl libqca2-plugin-gnupg libqca2-plugin-ossl phonon-backend-vlc phonon-backend-xine phonon-backend-mplayer The following packages will be REMOVED: aptoncd* apturl* mintupdate* mintwelcome* synaptic* The following NEW packages will be installed: apturl-kde icoutils kate-data katepart kde-runtime kde-runtime-data kdelibs-bin kdelibs5-data kdelibs5-plugins kdesudo kdoctools kubuntu-debug-installer libattica0.4 libdlrestrictions1 libkactivities-bin libkactivities-models1 libkactivities6 libkatepartinterfaces4 libkcmutils4 libkde3support4 libkdeclarative5 libkdecore5 libkdesu5 libkdeui5 libkdewebkit5 libkdnssd4 libkemoticons4 libkfile4 libkhtml5 libkidletime4 libkio5 libkjsapi4 libkjsembed4 libkmediaplayer4 libknewstuff3-4 libknotifyconfig4 libkntlm4 libkparts4 libkpty4 libkrosscore4 libktexteditor4 libkxmlrpcclient4 libnepomuk4 libnepomukcore4abi1 libnepomukquery4a libnepomukutils4 libntrack-qt4-1 libntrack0 libphonon4 libplasma3 libpolkit-qt-1-1 libpoppler-qt4-4 libqapt2 libqapt2-runtime libqca2 libqt4-qt3support libsolid4 libsoprano4 libstreamanalyzer0 libstreams0 libthreadweaver4 libvirtodbc0 libxml2-utils nepomuk-core nepomuk-core-data ntrack-module-libnl-0 odbcinst odbcinst1debian2 oxygen-icon-theme phonon phonon-backend-gstreamer plasma-scriptengine-javascript qapt-batch shared-desktop-ontologies soprano-daemon virtuoso-minimal virtuoso-opensource-6.1-bin virtuoso-opensource-6.1-common 0 upgraded, 78 newly installed, 5 to remove and 0 not upgraded. Need to get 60.9 MB of archives. After this operation, 146 MB of additional disk space will be used. Do you want to continue [Y/n]? n Abort. As you can see, apt-get is trying to install the same packages that it is asking me to autoremove. Could someone please tell me, how to uninstall synaptic properly? Or am I missing something? Just for the record, I also did, sudo apt-get autoremove --purge like it asked me to ... and this is what I got, Reading package lists... Done Building dependency tree Reading state information... Done 0 upgraded, 0 newly installed, 0 to remove and 6 not upgraded.

Read the article

Organization & Architecture UNISA Studies – Chap 6

- by MarkPearl

Learning Outcomes Discuss the physical characteristics of magnetic disks Describe how data is organized and accessed on a magnetic disk Discuss the parameters that play a role in the performance of magnetic disks Describe different optical memory devices Magnetic Disk The way data is stored on and retried from magnetic disks Data is recorded on and later retrieved form the disk via a conducting coil named the head (in many systems there are two heads) The writ mechanism exploits the fact that electricity flowing through a coil produces a magnetic field. Electric pulses are sent to the write head, and the resulting magnetic patterns are recorded on the surface below with different patterns for positive and negative currents The physical characteristics of a magnetic disk Summarize from book The factors that play a role in the performance of a disk Seek time – the time it takes to position the head at the track Rotational delay / latency – the time it takes for the beginning of the sector to reach the head Access time – the sum of the seek time and rotational delay Transfer time – the time it takes to transfer data RAID The rate of improvement in secondary storage performance has been considerably less than the rate for processors and main memory. Thus secondary storage has become a bit of a bottleneck. RAID works on the concept that if one disk can be pushed so far, additional gains in performance are to be had by using multiple parallel components. Points to note about RAID… RAID is a set of physical disk drives viewed by the operating system as a single logical drive Data is distributed across the physical drives of an array in a scheme known as striping Redundant disk capacity is used to store parity information, which guarantees data recoverability in case of a disk failure (not supported by RAID 0 or RAID 1) Interesting to note that the increase in the number of drives, increases the probability of failure. To compensate for this decreased reliability RAID makes use of stored parity information that enables the recovery of data lost due to a disk failure. The RAID scheme consists of 7 levels… Category Level Description Disks Required Data Availability Large I/O Data Transfer Capacity Small I/O Request Rate Striping 0 Non Redundant N Lower than single disk Very high Very high for both read and write Mirroring 1 Mirrored 2N Higher than RAID 2 – 5 but lower than RAID 6 Higher than single disk Up to twice that of a signle disk for read Parallel Access 2 Redundant via Hamming Code N + m Much higher than single disk Highest of all listed alternatives Approximately twice that of a single disk Parallel Access 3 Bit interleaved parity N + 1 Much higher than single disk Highest of all listed alternatives Approximately twice that of a single disk Independent Access 4 Block interleaved parity N + 1 Much higher than single disk Similar to RAID 0 for read, significantly lower than single disk for write Similar to RAID 0 for read, significantly lower than single disk for write Independent Access 5 Block interleaved parity N + 1 Much higher than single disk Similar to RAID 0 for read, lower than single disk for write Similar to RAID 0 for read, generally lower than single disk for write Independent Access 6 Block interleaved parity N + 2 Highest of all listed alternatives Similar to RAID 0 for read; lower than RAID 5 for write Similar to RAID 0 for read, significantly lower than RAID 5 for write Read page 215 – 221 for detailed explanation on RAID levels Optical Memory There are a variety of optical-disk systems available. Read through the table on page 222 – 223 Some of the devices include… CD CD-ROM CD-R CD-RW DVD DVD-R DVD-RW Blue-Ray DVD Magnetic Tape Most modern systems use serial recording – data is lade out as a sequence of bits along each track. The typical recording used in serial is referred to as serpentine recording. In this technique when data is being recorded, the first set of bits is recorded along the whole length of the tape. When the end of the tape is reached the heads are repostioned to record a new track, and the tape is again recorded on its whole length, this time in the opposite direction. That process continued back and forth until the tape is full. To increase speed, the read-write head is capable of reading and writing a number of adjacent tracks simultaneously. Data is still recorded serially along individual tracks, but blocks in sequence are stored on adjacent tracks as suggested. A tape drive is a sequential access device. Magnetic tape was the first kind of secondary memory. It is still widely used as the lowest-cost, slowest speed member of the memory hierarchy.

Read the article

Non-blocking I/O using Servlet 3.1: Scalable applications using Java EE 7 (TOTD #188)

- by arungupta

Servlet 3.0 allowed asynchronous request processing but only traditional I/O was permitted. This can restrict scalability of your applications. In a typical application, ServletInputStream is read in a while loop. public class TestServlet extends HttpServlet { protected void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { ServletInputStream input = request.getInputStream(); byte[] b = new byte[1024]; int len = -1; while ((len = input.read(b)) != -1) { . . . } }} If the incoming data is blocking or streamed slower than the server can read then the server thread is waiting for that data. The same can happen if the data is written to ServletOutputStream. This is resolved in Servet 3.1 (JSR 340, to be released as part Java EE 7) by adding event listeners - ReadListener and WriteListener interfaces. These are then registered using ServletInputStream.setReadListener and ServletOutputStream.setWriteListener. The listeners have callback methods that are invoked when the content is available to be read or can be written without blocking. The updated doGet in our case will look like: AsyncContext context = request.startAsync();ServletInputStream input = request.getInputStream();input.setReadListener(new MyReadListener(input, context)); Invoking setXXXListener methods indicate that non-blocking I/O is used instead of the traditional I/O. At most one ReadListener can be registered on ServletIntputStream and similarly at most one WriteListener can be registered on ServletOutputStream. ServletInputStream.isReady and ServletInputStream.isFinished are new methods to check the status of non-blocking I/O read. ServletOutputStream.canWrite is a new method to check if data can be written without blocking. MyReadListener implementation looks like: @Overridepublic void onDataAvailable() { try { StringBuilder sb = new StringBuilder(); int len = -1; byte b[] = new byte[1024]; while (input.isReady() && (len = input.read(b)) != -1) { String data = new String(b, 0, len); System.out.println("--> " + data); } } catch (IOException ex) { Logger.getLogger(MyReadListener.class.getName()).log(Level.SEVERE, null, ex); }}@Overridepublic void onAllDataRead() { System.out.println("onAllDataRead"); context.complete();}@Overridepublic void onError(Throwable t) { t.printStackTrace(); context.complete();} This implementation has three callbacks: onDataAvailable callback method is called whenever data can be read without blocking onAllDataRead callback method is invoked data for the current request is completely read. onError callback is invoked if there is an error processing the request. Notice, context.complete() is called in onAllDataRead and onError to signal the completion of data read. For now, the first chunk of available data need to be read in the doGet or service method of the Servlet. Rest of the data can be read in a non-blocking way using ReadListener after that. This is going to get cleaned up where all data read can happen in ReadListener only. The sample explained above can be downloaded from here and works with GlassFish 4.0 build 64 and onwards. The slides and a complete re-run of What's new in Servlet 3.1: An Overview session at JavaOne is available here. Here are some more references for you: Java EE 7 Specification Status Servlet Specification Project JSR Expert Group Discussion Archive Servlet 3.1 Javadocs

Read the article

How can I best implement 'cache until further notice' with memcache in multiple tiers?

- by ajreal

the term "client" used here is not referring to client's browser, but client server Before cache workflow 1. client make a HTTP request --> 2. server process --> 3. store parsed results into memcache for next use (cache indefinitely) --> 4. return results to client --> 5. client get the result, store into client's local memcache with TTL After cache workflow 1. another client make a HTTP request --> 2. memcache found return memcache results to client --> 3. client get the result, store into client's local memcache with TTL TTL = time to live Is possible for me to know when the data was updated, and to expire relevant memcache(s) accordingly. However, the pitfalls on client site cache TTL Any data update before the TTL is not pick-up by client memcache. In reverse manner, where there is no update, client memcache still expire after the TTL First request (or concurrent requests) after cache TTL will get throttle as it need to repeat the "Before cache workflow" In the event where client require several HTTP requests on a single web page, it could be very bad in performance. Ideal solution should be client to cache indefinitely until further notice. Here are the three proposals about futher notice Proposal 1 : Make use on HTTP header (current implementation) 1. client sent HTTP request last modified time header 2. server check if last data modified time=last cache time return status 304 3. client based on header to decide further processing GOOD? ---- - save some parsing for client - lesser data transfer BAD? ---- - fire a HTTP request is still slow - server end still need to process lots of requests Proposal 2 : Consistently issue a HTTP request to check all data group last modified time 1. client fire a HTTP request 2. server to return last modified time for all data group 3. client compare local last cache time with the result 4. if data group last cache time < server last modified time then request again for that data group only GOOD? ---- - only fetch what is no up-to-date - less requests for server BAD? ---- - every web page require a HTTP request Proposal 3 : Tell client when new data is available (Push) 1. when server end notice there is a change on a data group 2. notify clients on the changes 3. help clients to fetch again data 4. then reset client local memcache after data is parsed GOOD? ---- - let the cache act/behave like a true cache BAD? ---- - encourage race condition My preference is on proposal 3, and something like Gearman could be ideal Where there is a change, Gearman server to sent the task to multiple clients (workers). Am I crazy? (I know my first question is a bit crazy)

Read the article

Clarification On Write-Caching Policy, Its Underlying Options And How It Applies To Hard Drives And Solid-State Drives

- by Boris_yo

In last week after doing more research on subject matter, I have been wondering about what I have been neglecting all those years to understand write-caching policy, always leaving it on default setting. Write-caching policy improves writing performance and consists of write-back caching and write-cache buffer flushing. This is how I understand all the above, but correct me if I erred somewhere: Write-through cache / Write-through caching itself is not a part of write caching policy per se and it's when data is written to both cache and storage device so if Windows will need that data later again, it is retrieved from cache and not from storage device which means only improved read performance as there is no need for waiting for storage device to read required data again. Since data is still written to storage device, write performance isn't improved and represents no risk of data loss or corruption in case of power failure or system crash while only data in cache gets lost. This option seems to be enabled by default and is recommended for removable devices with no need to use function of "Safely Remove Hardware" on user's part. Write-back caching is similar to above but without writing data to storage device, periodically releasing data from cache and writing to storage device when it is idle. In my opinion this option improves both read and write performance but represents risk if power failure or system crash occurs with the outcome of not only losing data eventually to be written to storage device, but causing file inconsistencies or corrupted file system. Write-back caching cannot be enabled together with write-through caching and it is not recommended to be enabled if no backup power supply is availabe. Write-cache buffer flushing I reckon is similar to write-back caching but enables immediate release and writing of data from cache to storage device right before power outage occurs but I don't know if it applies also to occasional system crash. This option seem to be complementary to write-back cache reducing or potentially eliminating risk of data loss or corruption of file system. I have questions about relevance of last 2 options to today's modern SSDs in order to get best performance and with less wear on SSDs: I know that traditional hard drives come with onboard cache (I wonder what type of cache that is), but do SSDs also come with cache? Assuming they do, is this cache faster than their NAND flash and system RAM and worth taking the risk of utilizing it by enabling write-back cache? I read somewhere that generally storage device's cache is faster than RAM, but I want to be sure. Additionally I read that write-caching should be enabled since current data that is to be written later to NAND flash is kept for a while in cache and provided there is data that gets modified a lot before finally being written, holding of this data and its periodic release reduces its write times to SSD thereby reducing its wearing. Now regarding to write-cache buffer flushing, I heard that SSD controllers are so fast by themselves that enabling this option is not required, because they manage flushing. However, once again, I don't know if SSDs have their own onboard cache and whether or not it is faster than their NAND flash and system RAM because if it is, keeping this option enabled would make sense. Recently I have posted question about issue with my Intel 330 SSD 120GB which was main reason to do deeper research having suspicion of write-caching policy being the culprit of SSD's freezing issue assuming data being released is what causes freezes. Currently I have write-cache enabled and write-cache buffer flushing disabled because I believe SSD controller's management of write-cache flushing and Windows write-cache buffer flushing are conflicting with each other: Since I want to troubleshoot in small steps to finally determine the source of issue, I have decided to start with write-caching policy and the move to drivers, switching to AHCI later on and finally disabling DIPM (device initiated power management) through registry modification thanks to @TomWijsman

Read the article

DSOFramer closing Excel doc in another window. If unsaved data in file, dsoframer fails to open with

- by Steve

I'm using Microsoft's DSOFramer control to allow me to embed an Excel file in my dialog so the user can choose his sheet, then select his range of cells; it's used with an import button on my dialog. The problem is that when I call the DSOFramer's OPEN function, if I have Excel open in another window, it closes the Excel document (but leaves Excel running). If the document it tries to close has unsaved data, I get a dialog boxclosing Excel doc in another window. If unsaved data in file, dsoframer fails to open with a messagebox: "Attempt to access invalid address". I built the source, and stepped through, and its making a call in its CDsoDocObject::CreateFromFile function, calling BindToObject on an object of class IMoniker. The HR is 0x8001010a "The message filter indicated that the application is busy". On that failure, it tries to InstantiateDocObjectServer by classid of CLSID Microsoft Excel Worksheet... this fails with an HRESULT of 0x80040154 "Class not registered". The InstantiateDocObjectServer just calls CoCreateInstance on the classid, first with CLSCTX_LOCAL_SERVER, then (if that fails) with CLSCTX_INPROC_SERVER. I know DSOFramer is a popular sample project for embedding Office apps in various dialogs and forms. I'm hoping someone else has had this problem and might have some insight on how I can solve this. I really don't want it to close any other open Excel documents, and I really don't want it to error-out if it can't close the document due to unsaved data. Update 1: I've tried changing the classid that's passed in to "Excel.Application" (I know that class will resolve), but that didn't work. In CDsoDocObject, it tries to open key "HKEY_CLASSES_ROOT\CLSID{00024500-0000-0000-C000-000000000046}\DocObject", but fails. I've visually confirmed that the key is not present in my registry; The key is present for the guid, but there's no DocObject subkey. It then produces an error message box: "The associated COM server does not support ActiveX document embedding". I get similar (different key, of course) results when I try to use the Excel.Workbook programid. Update 2: I tried starting a 2nd instance of Excel, hoping that my automation would bind to it (being the most recently invoked) instead of the problem Excel instance, but it didn't seem to do that. Results were the same. My problem seems to have boiled down to this: I'm calling the BindToObject on an object of class IMoniker, and receiving 0x8001010A (RPC_E_SERVERCALL_RETRYLATER) "The message filter indicated that the application is busy". I've tried playing with the flags passed to the BindToObject (via the SetBindOptions), but nothing seems to make any difference. Update 3: It first tries to bind using an IMoniker class. If that fails, it calls CoCreateInstance for the clsid as a "fallback" method. This may work for other MS Office objects, but when it's Excel, the class is for the Worksheet. I modified the sample to CoCreateInstance _Application, then got the workbooks, then called the Workbooks::Open for the target file, which returns a Worksheet object. I then returned that pointer and merged back with the original sample code path. All working now.

Read the article

Improving the performance of XSL

- by Rachel

In the below XSL for the variable "insert-data", I have an input param with the structure, <insert-data> <data compareIndex="4" nodeName="d1e1"> <a/> </data> <data compareIndex="5" nodeName="d1e1"> <b/> </data> <data compareIndex="7" nodeName="d1e2"> <a/> </data> <data compareIndex="9" nodeName="d1e2"> <b/> </data> </insert-data> where "nodeName" is the id of a node and "compareIndex" is the position of the text content relative to the node having id "$nodeName". I am using the below XSL to select all the text nodes(generate-id) that satisfy the above condition and construct a data xml. The below implementation works perfectly but the time taken for the execution is in min. Is there a better way of implementing or is there any in-efficient operation being used. From my observation the code where the preceding text length is calculated consumes the major time. Please share your thoughts to improve the performance of the XSL. I am using Java SAXON XSL transformer. <xsl:variable name="insert-data" as="element()*"> <xsl:for-each select="$insert-file/insert-data/data"> <xsl:sort select="xsd:integer(@index)"/> <xsl:variable name="compareIndex" select="xsd:integer(@compareIndex)" /> <xsl:variable name="nodeName" select="@nodeName" /> <xsl:variable name="nodeContent" as="node()"> <xsl:copy-of select="node()"/> </xsl:variable> <xsl:for-each select="$main-root/*//text()[ancestor::*[@id = $nodeName]]"> <xsl:variable name="preTextLength" as="xsd:integer" select="sum((preceding::text())[. ancestor::*[@id = $nodeName]]/string-length(.))" /> <xsl:variable name="currentTextLength" as="xsd:integer" select="string-length(.)" /> <xsl:variable name="sum" select="$preTextLength + $currentTextLength" as="xsd:integer"></xsl:variable> <xsl:variable name="split-index" select="$compareIndex - $preTextLength" as="xsd:integer"></xsl:variable> <xsl:if test="($sum ge $compareIndex) and ($compareIndex gt $preTextLength)"> <data split-index="{$split-index}" text-id="{generate-id(.)}"> <xsl:copy-of select="$nodeContent"/> </data> </xsl:if> </xsl:for-each> </xsl:for-each> </xsl:variable>

Read the article

How do I get a Java to call data from the Internet? Where to even start??

- by cdg

Hello oh great wizards of all things android. I really need your help. Mostly because my little brain just doesn't know were to start. I am trying to pull data from the internet to make a widget for the home screen. I have the layout built: <?xml version="1.0" encoding="utf-8"?> <RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android" android:id="@+id/Layout" android:layout_width="fill_parent" android:layout_height="fill_parent" android:background="@drawable/widget_bg_normal" android:clipChildren="false" > <TextView android:id="@+id/text_view" android:layout_width="100px" android:layout_height="wrap_content" android:paddingTop="18px" android:layout_centerHorizontal="true" android:textSize="8px" android:text="158x154 Image downloaded from the internet goes here. Needs to be updated every evening at midnight or unless the button below is pressed. Now if I could only figure out exactly how to do this, life would be good." /> <Button android:id="@+id/new_button" android:layout_width="fill_parent" android:layout_height="wrap_content" android:text="Get New" android:layout_below="@+id/scroll_image" android:layout_centerHorizontal="true" android:padding="0px" android:textSize="10px" android:height="8px" android:includeFontPadding="false" /> </RelativeLayout> Got the provider xml bulit: <?xml version="1.0" encoding="utf-8"?> <appwidget-provider xmlns:android="http://schemas.android.com/apk/res/android" android:minWidth="150dip" android:minHeight="150dip" android:updatePeriodMillis="10000" android:initialLayout="@layout/widget" /> The Manifest works great. <?xml version="1.0" encoding="utf-8"?> <manifest xmlns:android="http://schemas.android.com/apk/res/android" package="com.dge.myandroid" android:versionCode="1" android:versionName="1.0"> <application android:icon="@drawable/icon" android:label="@string/app_name"> <activity android:name=".myactivty" android:label="@string/app_name"> <intent-filter> <action android:name="android.intent.action.MAIN" /> <category android:name="android.intent.category.LAUNCHER" /> </intent-filter> </activity>  <receiver android:name=".mywidget" android:label="@string/app_name" > <intent-filter> <action android:name="android.appwidget.action.APPWIDGET_UPDATE" /> </intent-filter> <meta-data android:name="android.appwidget.widgetprovider" android:resource="@xml/widgetprovider" /> </receiver>  </application> <uses-permission android:name="android.permission.INTERNET" /> <uses-sdk android:minSdkVersion="7" /> </manifest> The data it is calling looks something like this when it is called. It basically goes to a website that uses php to random the image: <html><body bgcolor="#000000">center> <a href="http://www.website.com" target="_blank"> <img border="0" src="http://www.webiste.com//0.gif"></a> <img src="http://www.webiste.com" style="border:none;" /> </center></body></html> But here is were I am stuck. I just don't know where to start at all. The java is so far beyond my little head that I don't know what to do. package com.dge.myandroid; import android.appwidget.AppWidgetProvider; public class mywidget extends AppWidgetProvider { } The wiki example just confused me more. I just don't know where to begin. Please help.

Read the article

how can udp data can passed through RS232 in ansi c?

- by moon

i want to transmit and receive data on RS232 using udp and i want to know about techniques which allow me to transmit and receive data on a faster rate and also no lose of data is there? thanx in advance. i have tried but need improvements if possible #include <stdio.h> #include <dos.h> #include<string.h> #include<conio.h> #include<iostream.h> #include<stdlib.h> #define PORT1 0x3f8 void main() { int c,ch,choice,i,a=0; char filename[30],filename2[30],buf; FILE *in,*out; clrscr(); while(1){ outportb(PORT1+0,0x03); outportb(PORT1+1,0); outportb(PORT1+3,0x03); outportb(PORT1+2,0xc7); outportb(PORT1+4,0x0b); cout<<"\n==============================================================="; cout<<"\n\t*****Serial Communication By BADR-U-ZAMAN******\nCommunication between two computers By serial port"; cout<<"\nPlease select\n[1]\tFor sending file \n[2]\tFor receiving file \n[3]\tTo exit\n"; cout<<"=================================================================\n"; cin>>choice; if(choice==1) { strcpy(filename,"C:\\TC\\BIN\\badr.cpp"); cout<<filename; for(i=0;i<=strlen(filename);i++) outportb(PORT1,filename[i]); in=fopen(filename,"r"); if (in==NULL) { cout<<"cannot open a file"; a=1; } if(a!=1) cout<<"\n\nFile sending.....\n\n"; while(!feof(in)) { buf=fgetc(in); cout<<buf; outportb(PORT1,buf); delay(5); } } else { if(choice==3) exit(0); i=0; buf='a'; while(buf!=NULL) { c=inportb(PORT1+5); if(c&1) { buf=inportb(PORT1); filename2[i]=buf; i++; } } out=fopen(filename2,"t"); cout<<"\n Filename received:"<<filename[2]; cout<<"\nReading from the port..."; cout<<"writing to file"<<filename2; do { c=inportb(PORT1+5); if(c&1) { buf=inportb(PORT1); cout<<buf; fputc(buf,out); delay(5); } if(kbhit()) { ch=getch(); } }while(ch!=27); } getch(); } }

Read the article

What's the fastest lookup algorithm for a pair data structure (i.e, a map)?

- by truncheon

In the following example a std::map structure is filled with 26 values from A - Z (for key) and 0 – 26 for value. The time taken (on my system) to lookup the last entry (10000000 times) is roughly 250 ms for the vector, and 125 ms for the map. (I compiled using release mode, with O3 option turned on for g++ 4.4) But if for some odd reason I wanted better performance than the std::map, what data structures and functions would I need to consider using? I apologize if the answer seems obvious to you, but I haven't had much experience in the performance critical aspects of C++ programming. UPDATE: This example is rather trivial and hides the true complexity of what I'm trying to achieve. My real world project is a simple scripting language that uses a parser, data tree, and interpreter (instead of a VM stack system). I need to use some kind of data structure (perhaps map) to store the variables names created by script programmers. These are likely to be pretty randomly named, so I need a lookup method that can quickly find a particular key within a (probably) fairly large list of names. #include <ctime> #include <map> #include <vector> #include <iostream> struct mystruct { char key; int value; mystruct(char k = 0, int v = 0) : key(k), value(v) { } }; int find(const std::vector<mystruct>& ref, char key) { for (std::vector<mystruct>::const_iterator i = ref.begin(); i != ref.end(); ++i) if (i->key == key) return i->value; return -1; } int main() { std::map<char, int> mymap; std::vector<mystruct> myvec; for (int i = 'a'; i < 'a' + 26; ++i) { mymap[i] = i - 'a'; myvec.push_back(mystruct(i, i - 'a')); } int pre = clock(); for (int i = 0; i < 10000000; ++i) { find(myvec, 'z'); } std::cout << "linear scan: milli " << clock() - pre << "\n"; pre = clock(); for (int i = 0; i < 10000000; ++i) { mymap['z']; } std::cout << "map scan: milli " << clock() - pre << "\n"; return 0; }

Read the article

Scala puts precedence on implicit conversion over "natural" operations... Why? Is this a bug? Or am

- by Alex R

This simple test, of course, works as expected: scala var b = 2 b: Int = 2 scala b += 1 scala b res3: Int = 3 Now I bring this into scope: class A(var x: Int) { def +=(y:Int) { this.x += y } } implicit def int2A(i:Int) : A = new A(i) I'm defining a new class and a += operation on it. I never expected this would affect the way my regular Ints behave. But it does: scala var b:Int = 0 b: Int = 0 scala b += 1 scala b res29: Int = 0 scala b += 2 scala b res31: Int = 0 Scala seems to prefer the implicit conversion over the natural += that is already defined to Ints. That leads to several questions... Why? Is this a bug? Is it by design? Is there a work-around (other than not using "+=")? Thanks

Read the article

C++ function will not return

- by Mike

I have a function that I am calling that runs all the way up to where it should return but doesn't return. If I cout something for debugging at the very end of the function, it gets displayed but the function does not return. fetchData is the function I am referring to. It gets called by outputFile. cout displays "done here" but not "data fetched" I know this code is messy but can anyone help me figure this out? Thanks //Given an inode return all data of i_block data char* fetchData(iNode tempInode){ char* data; data = new char[tempInode.i_size]; this->currentInodeSize = tempInode.i_size; //Loop through blocks to retrieve data vector<unsigned int> i_blocks; i_blocks.reserve(tempInode.i_blocks); this->currentDataPosition = 0; cout << "currentDataPosition set to 0" << std::endl; cout << "i_blocks:" << tempInode.i_blocks << std::endl; int i = 0; for(i = 0; i < 12; i++){ if(tempInode.i_block[i] == 0) break; i_blocks.push_back(tempInode.i_block[i]); } appendIndirectData(tempInode.i_block[12], &i_blocks); appendDoubleIndirectData(tempInode.i_block[13], &i_blocks); appendTripleIndirectData(tempInode.i_block[14], &i_blocks); //Loop through all the block addresses to get the actual data for(i=0; i < i_blocks.size(); i++){ appendData(i_blocks[i], data); } cout << "done here" << std::endl; return data; } void appendData(int block, char* data){ char* tempBuffer; tempBuffer = new char[this->blockSize]; ifstream file (this->filename, std::ios::binary); int entryLocation = block*this->blockSize; file.seekg (entryLocation, ios::beg); file.read(tempBuffer, this->blockSize); //Append this block to data for(int i=0; i < this->blockSize; i++){ data[this->currentDataPosition] = tempBuffer[i]; this->currentDataPosition++; } data[this->currentDataPosition] = '\0'; } void outputFile(iNode file, string filename){ char* data; cout << "File Transfer Started" << std::endl; data = this->fetchData(file); cout << "data fetched" << std::endl; char *outputFile = (char*)filename.c_str(); ofstream myfile; myfile.open (outputFile,ios::out|ios::binary); int i = 0; for(i=0; i < file.i_size; i++){ myfile << data[i]; } myfile.close(); cout << "File Transfer Completed" << std::endl; return; }

Read the article

Upload File to Windows Azure Blob in Chunks through ASP.NET MVC, JavaScript and HTML5

- by Shaun

Originally posted on: http://geekswithblogs.net/shaunxu/archive/2013/07/01/upload-file-to-windows-azure-blob-in-chunks-through-asp.net.aspxMany people are using Windows Azure Blob Storage to store their data in the cloud. Blob storage provides 99.9% availability with easy-to-use API through .NET SDK and HTTP REST. For example, we can store JavaScript files, images, documents in blob storage when we are building an ASP.NET web application on a Web Role in Windows Azure. Or we can store our VHD files in blob and mount it as a hard drive in our cloud service. If you are familiar with Windows Azure, you should know that there are two kinds of blob: page blob and block blob. The page blob is optimized for random read and write, which is very useful when you need to store VHD files. The block blob is optimized for sequential/chunk read and write, which has more common usage. Since we can upload block blob in blocks through BlockBlob.PutBlock, and them commit them as a whole blob with invoking the BlockBlob.PutBlockList, it is very powerful to upload large files, as we can upload blocks in parallel, and provide pause-resume feature. There are many documents, articles and blog posts described on how to upload a block blob. Most of them are focus on the server side, which means when you had received a big file, stream or binaries, how to upload them into blob storage in blocks through .NET SDK. But the problem is, how can we upload these large files from client side, for example, a browser. This questioned to me when I was working with a Chinese customer to help them build a network disk production on top of azure. The end users upload their files from the web portal, and then the files will be stored in blob storage from the Web Role. My goal is to find the best way to transform the file from client (end user’s machine) to the server (Web Role) through browser. In this post I will demonstrate and describe what I had done, to upload large file in chunks with high speed, and save them as blocks into Windows Azure Blob Storage. Traditional Upload, Works with Limitation The simplest way to implement this requirement is to create a web page with a form that contains a file input element and a submit button. 1: @using (Html.BeginForm("About", "Index", FormMethod.Post, new { enctype = "multipart/form-data" })) 2: { 3: <input type="file" name="file" /> 4: <input type="submit" value="upload" /> 5: } And then in the backend controller, we retrieve the whole content of this file and upload it in to the blob storage through .NET SDK. We can split the file in blocks and upload them in parallel and commit. The code had been well blogged in the community. 1: [HttpPost] 2: public ActionResult About(HttpPostedFileBase file) 3: { 4: var container = _client.GetContainerReference("test"); 5: container.CreateIfNotExists(); 6: var blob = container.GetBlockBlobReference(file.FileName); 7: var blockDataList = new Dictionary<string, byte[]>(); 8: using (var stream = file.InputStream) 9: { 10: var blockSizeInKB = 1024; 11: var offset = 0; 12: var index = 0; 13: while (offset < stream.Length) 14: { 15: var readLength = Math.Min(1024 * blockSizeInKB, (int)stream.Length - offset); 16: var blockData = new byte[readLength]; 17: offset += stream.Read(blockData, 0, readLength); 18: blockDataList.Add(Convert.ToBase64String(BitConverter.GetBytes(index)), blockData); 19: 20: index++; 21: } 22: } 23: 24: Parallel.ForEach(blockDataList, (bi) => 25: { 26: blob.PutBlock(bi.Key, new MemoryStream(bi.Value), null); 27: }); 28: blob.PutBlockList(blockDataList.Select(b => b.Key).ToArray()); 29: 30: return RedirectToAction("About"); 31: } This works perfect if we selected an image, a music or a small video to upload. But if I selected a large file, let’s say a 6GB HD-movie, after upload for about few minutes the page will be shown as below and the upload will be terminated. In ASP.NET there is a limitation of request length and the maximized request length is defined in the web.config file. It’s a number which less than about 4GB. So if we want to upload a really big file, we cannot simply implement in this way. Also, in Windows Azure, a cloud service network load balancer will terminate the connection if exceed the timeout period. From my test the timeout looks like 2 - 3 minutes. Hence, when we need to upload a large file we cannot just use the basic HTML elements. Besides the limitation mentioned above, the simple HTML file upload cannot provide rich upload experience such as chunk upload, pause and pause-resume. So we need to find a better way to upload large file from the client to the server. Upload in Chunks through HTML5 and JavaScript In order to break those limitation mentioned above we will try to upload the large file in chunks. This takes some benefit to us such as - No request size limitation: Since we upload in chunks, we can define the request size for each chunks regardless how big the entire file is. - No timeout problem: The size of chunks are controlled by us, which means we should be able to make sure request for each chunk upload will not exceed the timeout period of both ASP.NET and Windows Azure load balancer. It was a big challenge to upload big file in chunks until we have HTML5. There are some new features and improvements introduced in HTML5 and we will use them to implement our solution. In HTML5, the File interface had been improved with a new method called “slice”. It can be used to read part of the file by specifying the start byte index and the end byte index. For example if the entire file was 1024 bytes, file.slice(512, 768) will read the part of this file from the 512nd byte to 768th byte, and return a new object of interface called "Blob”, which you can treat as an array of bytes. In fact, a Blob object represents a file-like object of immutable, raw data. The File interface is based on Blob, inheriting blob functionality and expanding it to support files on the user's system. For more information about the Blob please refer here. File and Blob is very useful to implement the chunk upload. We will use File interface to represent the file the user selected from the browser and then use File.slice to read the file in chunks in the size we wanted. For example, if we wanted to upload a 10MB file with 512KB chunks, then we can read it in 512KB blobs by using File.slice in a loop. Assuming we have a web page as below. User can select a file, an input box to specify the block size in KB and a button to start upload. 1: <div> 2: <input type="file" id="upload_files" name="files[]" /><br /> 3: Block Size: <input type="number" id="block_size" value="512" name="block_size" />KB<br /> 4: <input type="button" id="upload_button_blob" name="upload" value="upload (blob)" /> 5: </div> Then we can have the JavaScript function to upload the file in chunks when user clicked the button. 1: <script type="text/javascript"> 1: 2: $(function () { 3: $("#upload_button_blob").click(function () { 4: }); 5: });</script> Firstly we need to ensure the client browser supports the interfaces we are going to use. Just try to invoke the File, Blob and FormData from the “window” object. If any of them is “undefined” the condition result will be “false” which means your browser doesn’t support these premium feature and it’s time for you to get your browser updated. FormData is another new feature we are going to use in the future. It could generate a temporary form for us. We will use this interface to create a form with chunk and associated metadata when invoked the service through ajax. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: if (window.File && window.Blob && window.FormData) { 4: alert("Your brwoser is awesome, let's rock!"); 5: } 6: else { 7: alert("Oh man plz update to a modern browser before try is cool stuff out."); 8: return; 9: } 10: }); Each browser supports these interfaces by their own implementation and currently the Blob, File and File.slice are supported by Chrome 21, FireFox 13, IE 10, Opera 12 and Safari 5.1 or higher. After that we worked on the files the user selected one by one since in HTML5, user can select multiple files in one file input box. 1: var files = $("#upload_files")[0].files; 2: for (var i = 0; i < files.length; i++) { 3: var file = files[i]; 4: var fileSize = file.size; 5: var fileName = file.name; 6: } Next, we calculated the start index and end index for each chunks based on the size the user specified from the browser. We put them into an array with the file name and the index, which will be used when we upload chunks into Windows Azure Blob Storage as blocks since we need to specify the target blob name and the block index. At the same time we will store the list of all indexes into another variant which will be used to commit blocks into blob in Azure Storage once all chunks had been uploaded successfully. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: 11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: var blockSizeInKB = $("#block_size").val(); 14: var blockSize = blockSizeInKB * 1024; 15: var blocks = []; 16: var offset = 0; 17: var index = 0; 18: var list = ""; 19: while (offset < fileSize) { 20: var start = offset; 21: var end = Math.min(offset + blockSize, fileSize); 22: 23: blocks.push({ 24: name: fileName, 25: index: index, 26: start: start, 27: end: end 28: }); 29: list += index + ","; 30: 31: offset = end; 32: index++; 33: } 34: } 35: }); Now we have all chunks’ information ready. The next step should be upload them one by one to the server side, and at the server side when received a chunk it will upload as a block into Blob Storage, and finally commit them with the index list through BlockBlobClient.PutBlockList. But since all these invokes are ajax calling, which means not synchronized call. So we need to introduce a new JavaScript library to help us coordinate the asynchronize operation, which named “async.js”. You can download this JavaScript library here, and you can find the document here. I will not explain this library too much in this post. We will put all procedures we want to execute as a function array, and pass into the proper function defined in async.js to let it help us to control the execution sequence, in series or in parallel. Hence we will define an array and put the function for chunk upload into this array. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: 5: // start to upload each files in chunks 6: var files = $("#upload_files")[0].files; 7: for (var i = 0; i < files.length; i++) { 8: var file = files[i]; 9: var fileSize = file.size; 10: var fileName = file.name; 11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: ... ... 14: 15: // define the function array and push all chunk upload operation into this array 16: blocks.forEach(function (block) { 17: putBlocks.push(function (callback) { 18: }); 19: }); 20: } 21: }); 22: }); As you can see, I used File.slice method to read each chunks based on the start and end byte index we calculated previously, and constructed a temporary HTML form with the file name, chunk index and chunk data through another new feature in HTML5 named FormData. Then post this form to the backend server through jQuery.ajax. This is the key part of our solution. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: blocks.forEach(function (block) { 15: putBlocks.push(function (callback) { 16: // load blob based on the start and end index for each chunks 17: var blob = file.slice(block.start, block.end); 18: // put the file name, index and blob into a temporary from 19: var fd = new FormData(); 20: fd.append("name", block.name); 21: fd.append("index", block.index); 22: fd.append("file", blob); 23: // post the form to backend service (asp.net mvc controller action) 24: $.ajax({ 25: url: "/Home/UploadInFormData", 26: data: fd, 27: processData: false, 28: contentType: "multipart/form-data", 29: type: "POST", 30: success: function (result) { 31: if (!result.success) { 32: alert(result.error); 33: } 34: callback(null, block.index); 35: } 36: }); 37: }); 38: }); 39: } 40: }); Then we will invoke these functions one by one by using the async.js. And once all functions had been executed successfully I invoked another ajax call to the backend service to commit all these chunks (blocks) as the blob in Windows Azure Storage. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.series(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); That’s all in the client side. The outline of our logic would be - Calculate the start and end byte index for each chunks based on the block size. - Defined the functions of reading the chunk form file and upload the content to the backend service through ajax. - Execute the functions defined in previous step with “async.js”. - Commit the chunks by invoking the backend service in Windows Azure Storage finally. Save Chunks as Blocks into Blob Storage In above we finished the client size JavaScript code. It uploaded the file in chunks to the backend service which we are going to implement in this step. We will use ASP.NET MVC as our backend service, and it will receive the chunks, upload into Windows Azure Bob Storage in blocks, then finally commit as one blob. As in the client side we uploaded chunks by invoking the ajax call to the URL "/Home/UploadInFormData", I created a new action under the Index controller and it only accepts HTTP POST request. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: } 8: catch (Exception e) 9: { 10: error = e.ToString(); 11: } 12: 13: return new JsonResult() 14: { 15: Data = new 16: { 17: success = string.IsNullOrWhiteSpace(error), 18: error = error 19: } 20: }; 21: } Then I retrieved the file name, index and the chunk content from the Request.Form object, which was passed from our client side. And then, used the Windows Azure SDK to create a blob container (in this case we will use the container named “test”.) and create a blob reference with the blob name (same as the file name). Then uploaded the chunk as a block of this blob with the index, since in Blob Storage each block must have an index (ID) associated with so that finally we can put all blocks as one blob by specifying their block ID list. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var index = int.Parse(Request.Form["index"]); 9: var file = Request.Files[0]; 10: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 11: 12: var container = _client.GetContainerReference("test"); 13: container.CreateIfNotExists(); 14: var blob = container.GetBlockBlobReference(name); 15: blob.PutBlock(id, file.InputStream, null); 16: } 17: catch (Exception e) 18: { 19: error = e.ToString(); 20: } 21: 22: return new JsonResult() 23: { 24: Data = new 25: { 26: success = string.IsNullOrWhiteSpace(error), 27: error = error 28: } 29: }; 30: } Next, I created another action to commit the blocks into blob once all chunks had been uploaded. Similarly, I retrieved the blob name from the Request.Form. I also retrieved the chunks ID list, which is the block ID list from the Request.Form in a string format, split them as a list, then invoked the BlockBlob.PutBlockList method. After that our blob will be shown in the container and ready to be download. 1: [HttpPost] 2: public JsonResult Commit() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var list = Request.Form["list"]; 9: var ids = list 10: .Split(',') 11: .Where(id => !string.IsNullOrWhiteSpace(id)) 12: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 13: .ToArray(); 14: 15: var container = _client.GetContainerReference("test"); 16: container.CreateIfNotExists(); 17: var blob = container.GetBlockBlobReference(name); 18: blob.PutBlockList(ids); 19: } 20: catch (Exception e) 21: { 22: error = e.ToString(); 23: } 24: 25: return new JsonResult() 26: { 27: Data = new 28: { 29: success = string.IsNullOrWhiteSpace(error), 30: error = error 31: } 32: }; 33: } Now we finished all code we need. The whole process of uploading would be like this below. Below is the full client side JavaScript code. 1: <script type="text/javascript" src="~/Scripts/async.js"></script> 2: <script type="text/javascript"> 3: $(function () { 4: $("#upload_button_blob").click(function () { 5: // assert the browser support html5 6: if (window.File && window.Blob && window.FormData) { 7: alert("Your brwoser is awesome, let's rock!"); 8: } 9: else { 10: alert("Oh man plz update to a modern browser before try is cool stuff out."); 11: return; 12: } 13: 14: // start to upload each files in chunks 15: var files = $("#upload_files")[0].files; 16: for (var i = 0; i < files.length; i++) { 17: var file = files[i]; 18: var fileSize = file.size; 19: var fileName = file.name; 20: 21: // calculate the start and end byte index for each blocks(chunks) 22: // with the index, file name and index list for future using 23: var blockSizeInKB = $("#block_size").val(); 24: var blockSize = blockSizeInKB * 1024; 25: var blocks = []; 26: var offset = 0; 27: var index = 0; 28: var list = ""; 29: while (offset < fileSize) { 30: var start = offset; 31: var end = Math.min(offset + blockSize, fileSize); 32: 33: blocks.push({ 34: name: fileName, 35: index: index, 36: start: start, 37: end: end 38: }); 39: list += index + ","; 40: 41: offset = end; 42: index++; 43: } 44: 45: // define the function array and push all chunk upload operation into this array 46: var putBlocks = []; 47: blocks.forEach(function (block) { 48: putBlocks.push(function (callback) { 49: // load blob based on the start and end index for each chunks 50: var blob = file.slice(block.start, block.end); 51: // put the file name, index and blob into a temporary from 52: var fd = new FormData(); 53: fd.append("name", block.name); 54: fd.append("index", block.index); 55: fd.append("file", blob); 56: // post the form to backend service (asp.net mvc controller action) 57: $.ajax({ 58: url: "/Home/UploadInFormData", 59: data: fd, 60: processData: false, 61: contentType: "multipart/form-data", 62: type: "POST", 63: success: function (result) { 64: if (!result.success) { 65: alert(result.error); 66: } 67: callback(null, block.index); 68: } 69: }); 70: }); 71: }); 72: 73: // invoke the functions one by one 74: // then invoke the commit ajax call to put blocks into blob in azure storage 75: async.series(putBlocks, function (error, result) { 76: var data = { 77: name: fileName, 78: list: list 79: }; 80: $.post("/Home/Commit", data, function (result) { 81: if (!result.success) { 82: alert(result.error); 83: } 84: else { 85: alert("done!"); 86: } 87: }); 88: }); 89: } 90: }); 91: }); 92: </script> And below is the full ASP.NET MVC controller code. 1: public class HomeController : Controller 2: { 3: private CloudStorageAccount _account; 4: private CloudBlobClient _client; 5: 6: public HomeController() 7: : base() 8: { 9: _account = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("DataConnectionString")); 10: _client = _account.CreateCloudBlobClient(); 11: } 12: 13: public ActionResult Index() 14: { 15: ViewBag.Message = "Modify this template to jump-start your ASP.NET MVC application."; 16: 17: return View(); 18: } 19: 20: [HttpPost] 21: public JsonResult UploadInFormData() 22: { 23: var error = string.Empty; 24: try 25: { 26: var name = Request.Form["name"]; 27: var index = int.Parse(Request.Form["index"]); 28: var file = Request.Files[0]; 29: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 30: 31: var container = _client.GetContainerReference("test"); 32: container.CreateIfNotExists(); 33: var blob = container.GetBlockBlobReference(name); 34: blob.PutBlock(id, file.InputStream, null); 35: } 36: catch (Exception e) 37: { 38: error = e.ToString(); 39: } 40: 41: return new JsonResult() 42: { 43: Data = new 44: { 45: success = string.IsNullOrWhiteSpace(error), 46: error = error 47: } 48: }; 49: } 50: 51: [HttpPost] 52: public JsonResult Commit() 53: { 54: var error = string.Empty; 55: try 56: { 57: var name = Request.Form["name"]; 58: var list = Request.Form["list"]; 59: var ids = list 60: .Split(',') 61: .Where(id => !string.IsNullOrWhiteSpace(id)) 62: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 63: .ToArray(); 64: 65: var container = _client.GetContainerReference("test"); 66: container.CreateIfNotExists(); 67: var blob = container.GetBlockBlobReference(name); 68: blob.PutBlockList(ids); 69: } 70: catch (Exception e) 71: { 72: error = e.ToString(); 73: } 74: 75: return new JsonResult() 76: { 77: Data = new 78: { 79: success = string.IsNullOrWhiteSpace(error), 80: error = error 81: } 82: }; 83: } 84: } And if we selected a file from the browser we will see our application will upload chunks in the size we specified to the server through ajax call in background, and then commit all chunks in one blob. Then we can find the blob in our Windows Azure Blob Storage. Optimized by Parallel Upload In previous example we just uploaded our file in chunks. This solved the problem that ASP.NET MVC request content size limitation as well as the Windows Azure load balancer timeout. But it might introduce the performance problem since we uploaded chunks in sequence. In order to improve the upload performance we could modify our client side code a bit to make the upload operation invoked in parallel. The good news is that, “async.js” library provides the parallel execution function. If you remembered the code we invoke the service to upload chunks, it utilized “async.series” which means all functions will be executed in sequence. Now we will change this code to “async.parallel”. This will invoke all functions in parallel. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallel(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); In this way all chunks will be uploaded to the server side at the same time to maximize the bandwidth usage. This should work if the file was not very large and the chunk size was not very small. But for large file this might introduce another problem that too many ajax calls are sent to the server at the same time. So the best solution should be, upload the chunks in parallel with maximum concurrency limitation. The code below specified the concurrency limitation to 4, which means at the most only 4 ajax calls could be invoked at the same time. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallelLimit(putBlocks, 4, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); Summary In this post we discussed how to upload files in chunks to the backend service and then upload them into Windows Azure Blob Storage in blocks. We focused on the frontend side and leverage three new feature introduced in HTML 5 which are - File.slice: Read part of the file by specifying the start and end byte index. - Blob: File-like interface which contains the part of the file content. - FormData: Temporary form element that we can pass the chunk alone with some metadata to the backend service. Then we discussed the performance consideration of chunk uploading. Sequence upload cannot provide maximized upload speed, but the unlimited parallel upload might crash the browser and server if too many chunks. So we finally came up with the solution to upload chunks in parallel with the concurrency limitation. We also demonstrated how to utilize “async.js” JavaScript library to help us control the asynchronize call and the parallel limitation. Regarding the chunk size and the parallel limitation value there is no “best” value. You need to test vary composition and find out the best one for your particular scenario. It depends on the local bandwidth, client machine cores and the server side (Windows Azure Cloud Service Virtual Machine) cores, memory and bandwidth. Below is one of my performance test result. The client machine was Windows 8 IE 10 with 4 cores. I was using Microsoft Cooperation Network. The web site was hosted on Windows Azure China North data center (in Beijing) with one small web role (1.7GB 1 core CPU, 1.75GB memory with 100Mbps bandwidth). The test cases were - Chunk size: 512KB, 1MB, 2MB, 4MB. - Upload Mode: Sequence, parallel (unlimited), parallel with limit (4 threads, 8 threads). - Chunk Format: base64 string, binaries. - Target file: 100MB. - Each case was tested 3 times. Below is the test result chart. Some thoughts, but not guidance or best practice: - Parallel gets better performance than series. - No significant performance improvement between parallel 4 threads and 8 threads. - Transform with binaries provides better performance than base64. - In all cases, chunk size in 1MB - 2MB gets better performance. Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Read the article

OWB 11gR2 - Early Arriving Facts

- by Dawei Sun

A common challenge when building ETL components for a data warehouse is how to handle early arriving facts. OWB 11gR2 introduced a new feature to address this for dimensional objects entitled Orphan Management. An orphan record is one that does not have a corresponding existing parent record. Orphan management automates the process of handling source rows that do not meet the requirements necessary to form a valid dimension or cube record. In this article, a simple example will be provided to show you how to use Orphan Management in OWB. We first import a sample MDL file that contains all the objects we need. Then we take some time to examine all the objects. After that, we prepare the source data, deploy the target table and dimension/cube loading map. Finally, we run the loading maps, and check the data in target dimension/cube tables. OK, let’s start… 1. Import MDL file and examine sample project First, download zip file from here, which includes a MDL file and three source data files. Then we open OWB design center, import orphan_management.mdl by using the menu File->Import->Warehouse Builder Metadata. Now we have several objects in BI_DEMO project as below: Mapping LOAD_CHANNELS_OM: The mapping for dimension loading. Mapping LOAD_SALES_OM: The mapping for cube loading. Dimension CHANNELS_OM: The dimension that contains channels data. Cube SALES_OM: The cube that contains sales data. Table CHANNELS_OM: The star implementation table of dimension CHANNELS_OM. Table SALES_OM: The star implementation table of cube SALES_OM. Table SRC_CHANNELS: The source table of channels data, that will be loaded into dimension CHANNELS_OM. Table SRC_ORDERS and SRC_ORDER_ITEMS: The source tables of sales data that will be loaded into cube SALES_OM. Sequence CLASS_OM_DIM_SEQ: The sequence used for loading dimension CHANNELS_OM. Dimension CHANNELS_OM This dimension has a hierarchy with three levels: TOTAL, CLASS and CHANNEL. Each level has three attributes: ID (surrogate key), NAME and SOURCE_ID (business key). It has a standard star implementation. The orphan management policy and the default parent setting are shown in the following screenshots: The orphan management policy options that you can set for loading are: Reject Orphan: The record is not inserted. Default Parent: You can specify a default parent record. This default record is used as the parent record for any record that does not have an existing parent record. If the default parent record does not exist, Warehouse Builder creates the default parent record. You specify the attribute values of the default parent record at the time of defining the dimensional object. If any ancestor of the default parent does not exist, Warehouse Builder also creates this record. No Maintenance: This is the default behavior. Warehouse Builder does not actively detect, reject, or fix orphan records. While removing data from a dimension, you can select one of the following orphan management policies: Reject Removal: Warehouse Builder does not allow you to delete the record if it has existing child records. No Maintenance: This is the default behavior. Warehouse Builder does not actively detect, reject, or fix orphan records. (More details are at http://download.oracle.com/docs/cd/E11882_01/owb.112/e10935/dim_objects.htm#insertedID1) Cube SALES_OM This cube is references to dimension CHANNELS_OM. It has three measures: AMOUNT, QUANTITY and COST. The orphan management policy setting are shown as following screenshot: The orphan management policy options that you can set for loading are: No Maintenance: Warehouse Builder does not actively detect, reject, or fix orphan rows. Default Dimension Record: Warehouse Builder assigns a default dimension record for any row that has an invalid or null dimension key value. Use the Settings button to define the default parent row. Reject Orphan: Warehouse Builder does not insert the row if it does not have an existing dimension record. (More details are at http://download.oracle.com/docs/cd/E11882_01/owb.112/e10935/dim_objects.htm#BABEACDG) Mapping LOAD_CHANNELS_OM This mapping loads source data from table SRC_CHANNELS to dimension CHANNELS_OM. The operator CHANNELS_IN is bound to table SRC_CHANNELS; CHANNELS_OUT is bound to dimension CHANNELS_OM. The TOTALS operator is used for generating a constant value for the top level in the dimension. The CLASS_FILTER operator is used to filter out the “invalid” class name, so then we can see what will happen when those channel records with an “invalid” parent are loading into dimension. Some properties of the dimension operator in this mapping are important to orphan management. See the screenshot below: Create Default Level Records: If YES, then default level records will be created. This property must be set to YES for dimensions and cubes if one of their orphan management policies is “Default Parent” or “Default Dimension Record”. This property is set to NO by default, so the user may need to set this to YES manually. LOAD policy for INVALID keys/ LOAD policy for NULL keys: These two properties have the same meaning as in the dimension editor. The values are set to the same as the dimension value when user drops the dimension into the mapping. The user does not need to modify these properties. Record Error Rows: If YES, error rows will be inserted into error table when loading the dimension. REMOVE Orphan Policy: This property is used when removing data from a dimension. Since the dimension loading type is set to LOAD in this example, this property is disabled. Mapping LOAD_SALES_OM This mapping loads source data from table SRC_ORDERS and SRC_ORDER_ITEMS to cube SALES_OM. This mapping seems a little bit complicated, but operators in the red rectangle are used to filter out and generate the records with “invalid” or “null” dimension keys. Some properties of the cube operator in a mapping are important to orphan management. See the screenshot below: Enable Source Aggregation: Should be checked in this example. If the default dimension record orphan policy is set for the cube operator, then it is recommended that source aggregation also be enabled. Otherwise, the orphan management processing may produce multiple fact rows with the same default dimension references, which will cause an “unstable rowset” execution error in the database, since the dimension refs are used as update match attributes for updating the fact table. LOAD policy for INVALID keys/ LOAD policy for NULL keys: These two properties have the same meaning as in the cube editor. The values are set to the same as in the cube editor when the user drops the cube into the mapping. The user does not need to modify these properties. Record Error Rows: If YES, error rows will be inserted into error table when loading the cube. 2. Deploy objects and mappings We now can deploy the objects. First, make sure location SALES_WH_LOCAL has been correctly configured. Then open Control Center Manager by using the menu Tools->Control Center Manager. Expand BI_DEMO->SALES_WH_LOCAL, click SALES_WH node on the project tree. We can see the following objects: Deploy all the objects in the following order: Sequence CLASS_OM_DIM_SEQ Table CHANNELS_OM, SALES_OM, SRC_CHANNELS, SRC_ORDERS, SRC_ORDER_ITEMS Dimension CHANNELS_OM Cube SALES_OM Mapping LOAD_CHANNELS_OM, LOAD_SALES_OM Note that we deployed source tables as well. Normally, we import source table from database instead of deploying them to target schema. However, in this example, we designed the source tables in OWB and deployed them to database for the purpose of this demonstration. 3. Prepare and examine source data Before running the mappings, we need to populate and examine the source data first. Run SRC_CHANNELS.sql, SRC_ORDERS.sql and SRC_ORDER_ITEMS.sql as target user. Then we check the data in these three tables. Table SRC_CHANNELS SQL> select rownum, id, class, name from src_channels; Records 1~5 are correct; they should be loaded into dimension without error. Records 6,7 and 8 have null parents; they should be loaded into dimension with a default parent value, and should be inserted into error table at the same time. Records 9, 10 and 11 have “invalid” parents; they should be rejected by dimension, and inserted into error table. Table SRC_ORDERS and SRC_ORDER_ITEMS SQL> select rownum, a.id, a.channel, b.amount, b.quantity, b.cost from src_orders a, src_order_items b where a.id = b.order_id; Record 178 has null dimension reference; it should be loaded into cube with a default dimension reference, and should be inserted into error table at the same time. Record 179 has “invalid” dimension reference; it should be rejected by cube, and inserted into error table. Other records should be aggregated and loaded into cube correctly. 4. Run the mappings and examine the target data In the Control Center Manager, expand BI_DEMO-> SALES_WH_LOCAL-> SALES_WH-> Mappings, right click on LOAD_CHANNELS_OM node, click Start. Use the same way to run mapping LOAD_SALES_OM. When they successfully finished, we can check the data in target tables. Table CHANNELS_OM SQL> select rownum, total_id, total_name, total_source_id, class_id,class_name, class_source_id, channel_id, channel_name,channel_source_id from channels_om order by abs(dimension_key); Records 1,2 and 3 are the default dimension records for the three levels. Records 8, 10 and 15 are the loaded records that originally have null parents. We see their parents name (class_name) is set to DEF_CLASS_NAME. Those records whose CHANNEL_NAME are Special_4, Special_5 and Special_6 are not loaded to this table because of the invalid parent. Error Table CHANNELS_OM_ERR SQL> select rownum, class_source_id, channel_id, channel_name,channel_source_id, err$$$_error_reason from channels_om_err order by channel_name; We can see all the record with null parent or invalid parent are inserted into this error table. Error reason is “Default parent used for record” for the first three records, and “No parent found for record” for the last three. Table SALES_OM SQL> select a.*, b.channel_name from sales_om a, channels_om b where a.channels=b.channel_id; We can see the order record with null channel_name has been loaded into target table with a default channel_name. The one with “invalid” channel_name are not loaded. Error Table SALES_OM_ERR SQL> select a.amount, a.cost, a.quantity, a.channels, b.channel_name, a.err$$$_error_reason from sales_om_err a, channels_om b where a.channels=b.channel_id(+); We can see the order records with null or invalid channel_name are inserted into error table. If the dimension reference column is null, the error reason is “Default dimension record used for fact”. If it is invalid, the error reason is “Dimension record not found for fact”. Summary In summary, this article illustrated the Orphan Management feature in OWB 11gR2. Automated orphan management policies improve ETL developer and administrator productivity by addressing an important cause of cube and dimension load failures, without requiring developers to explicitly build logic to handle these orphan rows.

Search Results

Search found 60932 results on 2438 pages for 'data operations'.

Page 482/2438 | < Previous Page | 478 479 480 481 482 483 484 485 486 487 488 489 | Next Page >

- by meder

- by bob.webster

- by pinaldave

- by pinaldave

- by Ben Rees

- by Ralf Westphal

- by Most Valuable Yak (Rob Volk)

- by user1204548

- by Javier Treviño

- by Clint Edmonson

- by David Dorf

- by Jay

- by MarkPearl

- by arungupta

- by ajreal

- by Boris_yo

- by Steve

- by Rachel

- by cdg

- by moon

- by truncheon

- by Alex R

- by Mike

- by Shaun

- by Dawei Sun

< Previous Page | 478 479 480 481 482 483 484 485 486 487 488 489 | Next Page >