Search Results

Search found 10424 results on 417 pages for 'persisted column'.

Page 114/417 | < Previous Page | 110 111 112 113 114 115 116 117 118 119 120 121 | Next Page >

MasterDetails Loading on Demand problem

- by devnet247

Hi As an exercise to learn wpf and understand how binding works I have an example that works.However when I try to load on demand I fail miserably. I basically have 3 classes Country-City-Hotels If I load ALL in one go it all works if I load on demand it fails miserably. What Am I doing wrong? Works <Window x:Class="MasterDetailCollectionViewSource.CountryCityHotelWindow" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" Title="CountryCityHotelWindow" Height="300" Width="450"> <Window.Resources> <CollectionViewSource Source="{Binding}" x:Key="cvsCountryList"/> <CollectionViewSource Source="{Binding Source={StaticResource cvsCountryList},Path=Cities}" x:Key="cvsCityList"/> <CollectionViewSource Source="{Binding Source={StaticResource cvsCityList},Path=Hotels}" x:Key="cvsHotelList"/> </Window.Resources> <Grid> <Grid.ColumnDefinitions> <ColumnDefinition/> <ColumnDefinition/> <ColumnDefinition/> </Grid.ColumnDefinitions> <Grid.RowDefinitions> <RowDefinition Height="Auto"/> <RowDefinition/> </Grid.RowDefinitions> <TextBlock Grid.Column="0" Grid.Row="0" Text="Countries"/> <TextBlock Grid.Column="1" Grid.Row="0" Text="Cities"/> <TextBlock Grid.Column="2" Grid.Row="0" Text="Hotels"/> <ListBox Grid.Column="0" Grid.Row="1" Name="lstCountries" ItemsSource="{Binding Source={StaticResource cvsCountryList}}" DisplayMemberPath="Name" SelectionChanged="OnSelectionChanged"/> <ListBox Grid.Column="1" Grid.Row="1" Name="lstCities" ItemsSource="{Binding Source={StaticResource cvsCityList}}" DisplayMemberPath="Name" SelectionChanged="OnSelectionChanged"/> <ListBox Grid.Column="2" Grid.Row="1" Name="lstHotels" ItemsSource="{Binding Source={StaticResource cvsHotelList}}" DisplayMemberPath="Name" SelectionChanged="OnSelectionChanged"/> </Grid> </Window> DOES NOT WORK Xaml is the same as above, however I have added the following that fetches stuff on demand. It loads the countries only as opposed to the other one where it Loads everything at once and not code behind is necessary. public CountryCityHotelWindow() { InitializeComponent(); //Load only country Initially lstCountries.ItemsSource=Repository.GetCountries(); DataContext = lstCountries; } private void OnSelectionChanged(object sender, SelectionChangedEventArgs e) { var lstBox = (ListBox)e.OriginalSource; switch (lstBox.Name) { case "lstCountries": var country = lstBox.SelectedItem as Country; if (country == null) return; lstCities.ItemsSource = Repository.GetCities(country.Name); break; case "lstCities": var city = lstBox.SelectedItem as City; if (city == null) return; lstHotels.ItemsSource = Repository.GetHotels(city.Name); break; case "lstHotels": break; } } What Am I doing Wrong? Thanks

Read the article
SQL SERVER – Enable Identity Insert – Import Expert Wizard

- by pinaldave

I recently got email from old friend who told me that when he tries to execute SSIS package it fails with some identity error. After some debugging and opening his package we figure out that he has following issue. Let us see what kind of set up he had on his package. Source Table with Identity column Destination Table with Identity column Following checkbox was disabled in Import Expert Wizard (as per the image below) What did we do is we enabled the checkbox described as above and we fixed the problem he was having due to insertion in identity column. The reason he was facing this error because his destination table had IDENTITY property which will not allow any insert from user. This value is automatically generated by system when new values are inserted in the table. However, when user manually tries to insert value in the table, it stops them and throws an error. As we enabled the checkbox “Enable Identity Insert”, this feature allowed the values to be insert in the identity field and this way from source database exact identity values were moved to destination table. Let me know if this blog post was easy to understand. Reference: Pinal Dave (http://blog.SQLAuthority.com), Filed under: Pinal Dave, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology

Read the article
SQL SERVER – Importing CSV File Into Database – SQL in Sixty Seconds #018 – Video

- by pinaldave

Importing data into database is one of the most important tasks. I often receive questions regarding what is the quickest way to insert CSV data or how to import CSV Data into SQL Server Table. Honestly the process is very simple and the script is even simpler. In today’s SQL in Sixty Seconds Video we will learn how quickly we can insert CSV data into SQL Server. The steps to import CSV are very simple. Create Table Use Bulk Insert to import the data Verify the data Done! Absolutely it is that simple. More on Importing CSV Data: SQL SERVER – Import CSV File Into SQL Server Using Bulk Insert – Load Comma Delimited File Into SQL Server SQL SERVER – Import CSV File into Database Table Using SSIS SQL SERVER – Create a Comma Delimited List Using SELECT Clause From Table Column SQL SERVER – Comma Separated Values (CSV) from Table Column SQL SERVER – Comma Separated Values (CSV) from Table Column – Part 2 I encourage you to submit your ideas for SQL in Sixty Seconds. We will try to accommodate as many as we can. If we like your idea we promise to share with you educational material. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Database, Pinal Dave, PostADay, SQL, SQL Authority, SQL in Sixty Seconds, SQL Query, SQL Scripts, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL, Technology, Video

Read the article
TOTD #166: Using NoSQL database in your Java EE 6 Applications on GlassFish - MongoDB for now!

- by arungupta

The Java EE 6 platform includes Java Persistence API to work with RDBMS. The JPA specification defines a comprehensive API that includes, but not restricted to, how a database table can be mapped to a POJO and vice versa, provides mechanisms how a PersistenceContext can be injected in a @Stateless bean and then be used for performing different operations on the database table and write typesafe queries. There are several well known advantages of RDBMS but the NoSQL movement has gained traction over past couple of years. The NoSQL databases are not intended to be a replacement for the mainstream RDBMS. As Philosophy of NoSQL explains, NoSQL database was designed for casual use where all the features typically provided by an RDBMS are not required. The name "NoSQL" is more of a category of databases that is more known for what it is not rather than what it is. The basic principles of NoSQL database are: No need to have a pre-defined schema and that makes them a schema-less database. Addition of new properties to existing objects is easy and does not require ALTER TABLE. The unstructured data gives flexibility to change the format of data any time without downtime or reduced service levels. Also there are no joins happening on the server because there is no structure and thus no relation between them. Scalability and performance is more important than the entire set of functionality typically provided by an RDBMS. This set of databases provide eventual consistency and/or transactions restricted to single items but more focus on CRUD. Not be restricted to SQL to access the information stored in the backing database. Designed to scale-out (horizontal) instead of scale-up (vertical). This is important knowing that databases, and everything else as well, is moving into the cloud. RBDMS can scale-out using sharding but requires complex management and not for the faint of heart. Unlike RBDMS which require a separate caching tier, most of the NoSQL databases comes with integrated caching. Designed for less management and simpler data models lead to lower administration as well. There are primarily three types of NoSQL databases: Key-Value stores (e.g. Cassandra and Riak) Document databases (MongoDB or CouchDB) Graph databases (Neo4J) You may think NoSQL is panacea but as I mentioned above they are not meant to replace the mainstream databases and here is why: RDBMS have been around for many years, very stable, and functionally rich. This is something CIOs and CTOs can bet their money on without much worry. There is a reason 98% of Fortune 100 companies run Oracle :-) NoSQL is cutting edge, brings excitement to developers, but enterprises are cautious about them. Commercial databases like Oracle are well supported by the backing enterprises in terms of providing support resources on a global scale. There is a full ecosystem built around these commercial databases providing training, performance tuning, architecture guidance, and everything else. NoSQL is fairly new and typically backed by a single company not able to meet the scale of these big enterprises. NoSQL databases are good for CRUDing operations but business intelligence is extremely important for enterprises to stay competitive. RDBMS provide extensive tooling to generate this data but that was not the original intention of NoSQL databases and is lacking in that area. Generating any meaningful information other than CRUDing require extensive programming. Not suited for complex transactions such as banking systems or other highly transactional applications requiring 2-phase commit. SQL cannot be used with NoSQL databases and writing simple queries can be involving. Enough talking, lets take a look at some code. This blog has published multiple blogs on how to access a RDBMS using JPA in a Java EE 6 application. This Tip Of The Day (TOTD) will show you can use MongoDB (a document-oriented database) with a typical 3-tier Java EE 6 application. Lets get started! The complete source code of this project can be downloaded here. Download MongoDB for your platform from here (1.8.2 as of this writing) and start the server as: arun@ArunUbuntu:~/tools/mongodb-linux-x86_64-1.8.2/bin$./mongod./mongod --help for help and startup optionsSun Jun 26 20:41:11 [initandlisten] MongoDB starting : pid=11210port=27017 dbpath=/data/db/ 64-bit Sun Jun 26 20:41:11 [initandlisten] db version v1.8.2, pdfile version4.5Sun Jun 26 20:41:11 [initandlisten] git version:433bbaa14aaba6860da15bd4de8edf600f56501bSun Jun 26 20:41:11 [initandlisten] build sys info: Linuxbs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 2017:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41Sun Jun 26 20:41:11 [initandlisten] waiting for connections on port 27017Sun Jun 26 20:41:11 [websvr] web admin interface listening on port 28017 The default directory for the database is /data/db and needs to be created as: sudo mkdir -p /data/db/sudo chown `id -u` /data/db You can specify a different directory using "--dbpath" option. Refer to Quickstart for your specific platform. Using NetBeans, create a Java EE 6 project and make sure to enable CDI and add JavaServer Faces framework. Download MongoDB Java Driver (2.6.3 of this writing) and add it to the project library by selecting "Properties", "LIbraries", "Add Library...", creating a new library by specifying the location of the JAR file, and adding the library to the created project. Edit the generated "index.xhtml" such that it looks like: <h1>Add a new movie</h1><h:form> Name: <h:inputText value="#{movie.name}" size="20"/><br/> Year: <h:inputText value="#{movie.year}" size="6"/><br/> Language: <h:inputText value="#{movie.language}" size="20"/><br/> <h:commandButton actionListener="#{movieSessionBean.createMovie}" action="show" title="Add" value="submit"/></h:form> This page has a simple HTML form with three text boxes and a submit button. The text boxes take name, year, and language of a movie and the submit button invokes the "createMovie" method of "movieSessionBean" and then render "show.xhtml". Create "show.xhtml" ("New" -> "Other..." -> "Other" -> "XHTML File") such that it looks like: <head> <title><h1>List of movies</h1></title> </head> <body> <h:form> <h:dataTable value="#{movieSessionBean.movies}" var="m" > <h:column><f:facet name="header">Name</f:facet>#{m.name}</h:column> <h:column><f:facet name="header">Year</f:facet>#{m.year}</h:column> <h:column><f:facet name="header">Language</f:facet>#{m.language}</h:column> </h:dataTable> </h:form> This page shows the name, year, and language of all movies stored in the database so far. The list of movies is returned by "movieSessionBean.movies" property. Now create the "Movie" class such that it looks like: import com.mongodb.BasicDBObject;import com.mongodb.BasicDBObject;import com.mongodb.DBObject;import javax.enterprise.inject.Model;import javax.validation.constraints.Size;/** * @author arun */@Modelpublic class Movie { @Size(min=1, max=20) private String name; @Size(min=1, max=20) private String language; private int year; // getters and setters for "name", "year", "language" public BasicDBObject toDBObject() { BasicDBObject doc = new BasicDBObject(); doc.put("name", name); doc.put("year", year); doc.put("language", language); return doc; } public static Movie fromDBObject(DBObject doc) { Movie m = new Movie(); m.name = (String)doc.get("name"); m.year = (int)doc.get("year"); m.language = (String)doc.get("language"); return m; } @Override public String toString() { return name + ", " + year + ", " + language; }} Other than the usual boilerplate code, the key methods here are "toDBObject" and "fromDBObject". These methods provide a conversion from "Movie" -> "DBObject" and vice versa. The "DBObject" is a MongoDB class that comes as part of the mongo-2.6.3.jar file and which we added to our project earlier. The complete javadoc for 2.6.3 can be seen here. Notice, this class also uses Bean Validation constraints and will be honored by the JSF layer. Finally, create "MovieSessionBean" stateless EJB with all the business logic such that it looks like: package org.glassfish.samples;import com.mongodb.BasicDBObject;import com.mongodb.DB;import com.mongodb.DBCollection;import com.mongodb.DBCursor;import com.mongodb.DBObject;import com.mongodb.Mongo;import java.net.UnknownHostException;import java.util.ArrayList;import java.util.List;import javax.annotation.PostConstruct;import javax.ejb.Stateless;import javax.inject.Inject;import javax.inject.Named;/** * @author arun */@Stateless@Namedpublic class MovieSessionBean { @Inject Movie movie; DBCollection movieColl; @PostConstruct private void initDB() throws UnknownHostException { Mongo m = new Mongo(); DB db = m.getDB("movieDB"); movieColl = db.getCollection("movies"); if (movieColl == null) { movieColl = db.createCollection("movies", null); } } public void createMovie() { BasicDBObject doc = movie.toDBObject(); movieColl.insert(doc); } public List<Movie> getMovies() { List<Movie> movies = new ArrayList(); DBCursor cur = movieColl.find(); System.out.println("getMovies: Found " + cur.size() + " movie(s)"); for (DBObject dbo : cur.toArray()) { movies.add(Movie.fromDBObject(dbo)); } return movies; }} The database is initialized in @PostConstruct. Instead of a working with a database table, NoSQL databases work with a schema-less document. The "Movie" class is the document in our case and stored in the collection "movies". The collection allows us to perform query functions on all movies. The "getMovies" method invokes "find" method on the collection which is equivalent to the SQL query "select * from movies" and then returns a List<Movie>. Also notice that there is no "persistence.xml" in the project. Right-click and run the project to see the output as: Enter some values in the text box and click on enter to see the result as: If you reached here then you've successfully used MongoDB in your Java EE 6 application, congratulations! Some food for thought and further play ... SQL to MongoDB mapping shows mapping between traditional SQL -> Mongo query language. Tutorial shows fun things you can do with MongoDB. Try the interactive online shell The cookbook provides common ways of using MongoDB In terms of this project, here are some tasks that can be tried: Encapsulate database management in a JPA persistence provider. Is it even worth it because the capabilities are going to be very different ? MongoDB uses "BSonObject" class for JSON representation, add @XmlRootElement on a POJO and how a compatible JSON representation can be generated. This will make the fromXXX and toXXX methods redundant.

Read the article
TOTD #166: Using NoSQL database in your Java EE 6 Applications on GlassFish - MongoDB for now!

- by arungupta

The Java EE 6 platform includes Java Persistence API to work with RDBMS. The JPA specification defines a comprehensive API that includes, but not restricted to, how a database table can be mapped to a POJO and vice versa, provides mechanisms how a PersistenceContext can be injected in a @Stateless bean and then be used for performing different operations on the database table and write typesafe queries. There are several well known advantages of RDBMS but the NoSQL movement has gained traction over past couple of years. The NoSQL databases are not intended to be a replacement for the mainstream RDBMS. As Philosophy of NoSQL explains, NoSQL database was designed for casual use where all the features typically provided by an RDBMS are not required. The name "NoSQL" is more of a category of databases that is more known for what it is not rather than what it is. The basic principles of NoSQL database are: No need to have a pre-defined schema and that makes them a schema-less database. Addition of new properties to existing objects is easy and does not require ALTER TABLE. The unstructured data gives flexibility to change the format of data any time without downtime or reduced service levels. Also there are no joins happening on the server because there is no structure and thus no relation between them. Scalability and performance is more important than the entire set of functionality typically provided by an RDBMS. This set of databases provide eventual consistency and/or transactions restricted to single items but more focus on CRUD. Not be restricted to SQL to access the information stored in the backing database. Designed to scale-out (horizontal) instead of scale-up (vertical). This is important knowing that databases, and everything else as well, is moving into the cloud. RBDMS can scale-out using sharding but requires complex management and not for the faint of heart. Unlike RBDMS which require a separate caching tier, most of the NoSQL databases comes with integrated caching. Designed for less management and simpler data models lead to lower administration as well. There are primarily three types of NoSQL databases: Key-Value stores (e.g. Cassandra and Riak) Document databases (MongoDB or CouchDB) Graph databases (Neo4J) You may think NoSQL is panacea but as I mentioned above they are not meant to replace the mainstream databases and here is why: RDBMS have been around for many years, very stable, and functionally rich. This is something CIOs and CTOs can bet their money on without much worry. There is a reason 98% of Fortune 100 companies run Oracle :-) NoSQL is cutting edge, brings excitement to developers, but enterprises are cautious about them. Commercial databases like Oracle are well supported by the backing enterprises in terms of providing support resources on a global scale. There is a full ecosystem built around these commercial databases providing training, performance tuning, architecture guidance, and everything else. NoSQL is fairly new and typically backed by a single company not able to meet the scale of these big enterprises. NoSQL databases are good for CRUDing operations but business intelligence is extremely important for enterprises to stay competitive. RDBMS provide extensive tooling to generate this data but that was not the original intention of NoSQL databases and is lacking in that area. Generating any meaningful information other than CRUDing require extensive programming. Not suited for complex transactions such as banking systems or other highly transactional applications requiring 2-phase commit. SQL cannot be used with NoSQL databases and writing simple queries can be involving. Enough talking, lets take a look at some code. This blog has published multiple blogs on how to access a RDBMS using JPA in a Java EE 6 application. This Tip Of The Day (TOTD) will show you can use MongoDB (a document-oriented database) with a typical 3-tier Java EE 6 application. Lets get started! The complete source code of this project can be downloaded here. Download MongoDB for your platform from here (1.8.2 as of this writing) and start the server as: arun@ArunUbuntu:~/tools/mongodb-linux-x86_64-1.8.2/bin$./mongod./mongod --help for help and startup optionsSun Jun 26 20:41:11 [initandlisten] MongoDB starting : pid=11210port=27017 dbpath=/data/db/ 64-bit Sun Jun 26 20:41:11 [initandlisten] db version v1.8.2, pdfile version4.5Sun Jun 26 20:41:11 [initandlisten] git version:433bbaa14aaba6860da15bd4de8edf600f56501bSun Jun 26 20:41:11 [initandlisten] build sys info: Linuxbs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 2017:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41Sun Jun 26 20:41:11 [initandlisten] waiting for connections on port 27017Sun Jun 26 20:41:11 [websvr] web admin interface listening on port 28017 The default directory for the database is /data/db and needs to be created as: sudo mkdir -p /data/db/sudo chown `id -u` /data/db You can specify a different directory using "--dbpath" option. Refer to Quickstart for your specific platform. Using NetBeans, create a Java EE 6 project and make sure to enable CDI and add JavaServer Faces framework. Download MongoDB Java Driver (2.6.3 of this writing) and add it to the project library by selecting "Properties", "LIbraries", "Add Library...", creating a new library by specifying the location of the JAR file, and adding the library to the created project. Edit the generated "index.xhtml" such that it looks like: <h1>Add a new movie</h1><h:form> Name: <h:inputText value="#{movie.name}" size="20"/><br/> Year: <h:inputText value="#{movie.year}" size="6"/><br/> Language: <h:inputText value="#{movie.language}" size="20"/><br/> <h:commandButton actionListener="#{movieSessionBean.createMovie}" action="show" title="Add" value="submit"/></h:form> This page has a simple HTML form with three text boxes and a submit button. The text boxes take name, year, and language of a movie and the submit button invokes the "createMovie" method of "movieSessionBean" and then render "show.xhtml". Create "show.xhtml" ("New" -> "Other..." -> "Other" -> "XHTML File") such that it looks like: <head> <title><h1>List of movies</h1></title> </head> <body> <h:form> <h:dataTable value="#{movieSessionBean.movies}" var="m" > <h:column><f:facet name="header">Name</f:facet>#{m.name}</h:column> <h:column><f:facet name="header">Year</f:facet>#{m.year}</h:column> <h:column><f:facet name="header">Language</f:facet>#{m.language}</h:column> </h:dataTable> </h:form> This page shows the name, year, and language of all movies stored in the database so far. The list of movies is returned by "movieSessionBean.movies" property. Now create the "Movie" class such that it looks like: import com.mongodb.BasicDBObject;import com.mongodb.BasicDBObject;import com.mongodb.DBObject;import javax.enterprise.inject.Model;import javax.validation.constraints.Size;/** * @author arun */@Modelpublic class Movie { @Size(min=1, max=20) private String name; @Size(min=1, max=20) private String language; private int year; // getters and setters for "name", "year", "language" public BasicDBObject toDBObject() { BasicDBObject doc = new BasicDBObject(); doc.put("name", name); doc.put("year", year); doc.put("language", language); return doc; } public static Movie fromDBObject(DBObject doc) { Movie m = new Movie(); m.name = (String)doc.get("name"); m.year = (int)doc.get("year"); m.language = (String)doc.get("language"); return m; } @Override public String toString() { return name + ", " + year + ", " + language; }} Other than the usual boilerplate code, the key methods here are "toDBObject" and "fromDBObject". These methods provide a conversion from "Movie" -> "DBObject" and vice versa. The "DBObject" is a MongoDB class that comes as part of the mongo-2.6.3.jar file and which we added to our project earlier. The complete javadoc for 2.6.3 can be seen here. Notice, this class also uses Bean Validation constraints and will be honored by the JSF layer. Finally, create "MovieSessionBean" stateless EJB with all the business logic such that it looks like: package org.glassfish.samples;import com.mongodb.BasicDBObject;import com.mongodb.DB;import com.mongodb.DBCollection;import com.mongodb.DBCursor;import com.mongodb.DBObject;import com.mongodb.Mongo;import java.net.UnknownHostException;import java.util.ArrayList;import java.util.List;import javax.annotation.PostConstruct;import javax.ejb.Stateless;import javax.inject.Inject;import javax.inject.Named;/** * @author arun */@Stateless@Namedpublic class MovieSessionBean { @Inject Movie movie; DBCollection movieColl; @PostConstruct private void initDB() throws UnknownHostException { Mongo m = new Mongo(); DB db = m.getDB("movieDB"); movieColl = db.getCollection("movies"); if (movieColl == null) { movieColl = db.createCollection("movies", null); } } public void createMovie() { BasicDBObject doc = movie.toDBObject(); movieColl.insert(doc); } public List<Movie> getMovies() { List<Movie> movies = new ArrayList(); DBCursor cur = movieColl.find(); System.out.println("getMovies: Found " + cur.size() + " movie(s)"); for (DBObject dbo : cur.toArray()) { movies.add(Movie.fromDBObject(dbo)); } return movies; }} The database is initialized in @PostConstruct. Instead of a working with a database table, NoSQL databases work with a schema-less document. The "Movie" class is the document in our case and stored in the collection "movies". The collection allows us to perform query functions on all movies. The "getMovies" method invokes "find" method on the collection which is equivalent to the SQL query "select * from movies" and then returns a List<Movie>. Also notice that there is no "persistence.xml" in the project. Right-click and run the project to see the output as: Enter some values in the text box and click on enter to see the result as: If you reached here then you've successfully used MongoDB in your Java EE 6 application, congratulations! Some food for thought and further play ... SQL to MongoDB mapping shows mapping between traditional SQL -> Mongo query language. Tutorial shows fun things you can do with MongoDB. Try the interactive online shell The cookbook provides common ways of using MongoDB In terms of this project, here are some tasks that can be tried: Encapsulate database management in a JPA persistence provider. Is it even worth it because the capabilities are going to be very different ? MongoDB uses "BSonObject" class for JSON representation, add @XmlRootElement on a POJO and how a compatible JSON representation can be generated. This will make the fromXXX and toXXX methods redundant.

Read the article
Investigation: Can different combinations of components effect Dataflow performance?

- by jamiet

Introduction The Dataflow task is one of the core components (if not the core component) of SQL Server Integration Services (SSIS) and often the most misunderstood. This is not surprising, its an incredibly complicated beast and we’re abstracted away from that complexity via some boxes that go yellow red or green and that have some lines drawn between them. Example dataflow In this blog post I intend to look under that facade and get into some of the nuts and bolts of the Dataflow Task by investigating how the decisions we make when building our packages can affect performance. I will do this by comparing the performance of three dataflows that all have the same input, all produce the same output, but which all operate slightly differently by way of having different transformation components. I also want to use this blog post to challenge a common held opinion that I see perpetuated over and over again on the SSIS forum. That is, that people assume adding components to a dataflow will be detrimental to overall performance. Its not surprising that people think this –it is intuitive to think that more components means more work- however this is not a view that I share. I have always been of the opinion that there are many factors affecting dataflow duration and the number of components is actually one of the less important ones; having said that I have never proven that assertion and that is one reason for this investigation. I have actually seen evidence that some people think dataflow duration is simply a function of number of rows and number of components. I’ll happily call that one out as a myth even without any investigation! The Setup I have a 2GB datafile which is a list of 4731904 (~4.7million) customer records with various attributes against them and it contains 2 columns that I am going to use for categorisation: [YearlyIncome] [BirthDate] The data file is a SSIS raw format file which I chose to use because it is the quickest way of getting data into a dataflow and given that I am testing the transformations, not the source or destination adapters, I want to minimise external influences as much as possible. In the test I will split the customers according to month of birth (12 of those) and whether or not their yearly income is above or below 50000 (2 of those); in other words I will be splitting them into 24 discrete categories and in order to do it I shall be using different combinations of SSIS’ Conditional Split and Derived Column transformation components. The 24 datapaths that occur will each input to a rowcount component, again because this is the least resource intensive means of terminating a datapath. The test is being carried out on a Dell XPS Studio laptop with a quad core (8 logical Procs) Intel Core i7 at 1.73GHz and Samsung SSD hard drive. Its running SQL Server 2008 R2 on Windows 7. The Variables Here are the three combinations of components that I am going to test: One Conditional Split - A single Conditional Split component CSPL Split by Month of Birth and income category that will use expressions on [YearlyIncome] & [BirthDate] to send each row to one of 24 outputs. This next screenshot displays the expression logic in use: Derived Column & Conditional Split - A Derived Column component DER Income Category that adds a new column [IncomeCategory] which will contain one of two possible text values {“LessThan50000”,”GreaterThan50000”} and uses [YearlyIncome] to determine which value each row should get. A Conditional Split component CSPL Split by Month of Birth and Income Category then uses that new column in conjunction with [BirthDate] to determine which of the same 24 outputs to send each row to. Put more simply, I am separating the Conditional Split of #1 into a Derived Column and a Conditional Split. The next screenshots display the expression logic in use: DER Income Category CSPL Split by Month of Birth and Income Category Three Conditional Splits - A Conditional Split component that produces two outputs based on [YearlyIncome], one for each Income Category. Each of those outputs will go to a further Conditional Split that splits the input into 12 outputs, one for each month of birth (identical logic in each). In this case then I am separating the single Conditional Split of #1 into three Conditional Split components. The next screenshots display the expression logic in use: CSPL Split by Income Category CSPL Split by Month of Birth 1& 2 Each of these combinations will provide an input to one of the 24 rowcount components, just the same as before. For illustration here is a screenshot of the dataflow containing three Conditional Split components: As you can these dataflows have a fair bit of work to do and remember that they’re doing that work for 4.7million rows. I will execute each dataflow 10 times and use the average for comparison. I foresee three possible outcomes: The dataflow containing just one Conditional Split (i.e. #1) will be quicker There is no significant difference between any of them One of the two dataflows containing multiple transformation components will be quicker Regardless of which of those outcomes come to pass we will have learnt something and that makes this an interesting test to carry out. Note that I will be executing the dataflows using dtexec.exe rather than hitting F5 within BIDS. The Results and Analysis The table below shows all of the executions, 10 for each dataflow. It also shows the average for each along with a standard deviation. All durations are in seconds. I’m pasting a screenshot because I frankly can’t be bothered with the faffing about needed to make a presentable HTML table. It is plain to see from the average that the dataflow containing three conditional splits is significantly faster, the other two taking 43% and 52% longer respectively. This seems strange though, right? Why does the dataflow containing the most components outperform the other two by such a big margin? The answer is actually quite logical when you put some thought into it and I’ll explain that below. Before progressing, a side note. The standard deviation for the “Three Conditional Splits” dataflow is orders of magnitude smaller – indicating that performance for this dataflow can be predicted with much greater confidence too. The Explanation I refer you to the screenshot above that shows how CSPL Split by Month of Birth and salary category in the first dataflow is setup. Observe that there is a case for each combination of Month Of Date and Income Category – 24 in total. These expressions get evaluated in the order that they appear and hence if we assume that Month of Date and Income Category are uniformly distributed in the dataset we can deduce that the expected number of expression evaluations for each row is 12.5 i.e. 1 (the minimum) + 24 (the maximum) divided by 2 = 12.5. Now take a look at the screenshots for the second dataflow. We are doing one expression evaluation in DER Income Category and we have the same 24 cases in CSPL Split by Month of Birth and Income Category as we had before, only the expression differs slightly. In this case then we have 1 + 12.5 = 13.5 expected evaluations for each row – that would account for the slightly longer average execution time for this dataflow. Now onto the third dataflow, the quick one. CSPL Split by Income Category does a maximum of 2 expression evaluations thus the expected number of evaluations per row is 1.5. CSPL Split by Month of Birth 1 & CSPL Split by Month of Birth 2 both have less work to do than the previous Conditional Split components because they only have 12 cases to test for thus the expected number of expression evaluations is 6.5 There are two of them so total expected number of expression evaluations for this dataflow is 6.5 + 6.5 + 1.5 = 14.5. 14.5 is still more than 12.5 & 13.5 though so why is the third dataflow so much quicker? Simple, the conditional expressions in the first two dataflows have two boolean predicates to evaluate – one for Income Category and one for Month of Birth; the expressions in the Conditional Split in the third dataflow however only have one predicate thus they are doing a lot less work. To sum up, the difference in execution times can be attributed to the difference between: MONTH(BirthDate) == 1 && YearlyIncome <= 50000 and MONTH(BirthDate) == 1 In the first two dataflows YearlyIncome <= 50000 gets evaluated an average of 12.5 times for every row whereas in the third dataflow it is evaluated once and once only. Multiply those 11.5 extra operations by 4.7million rows and you get a significant amount of extra CPU cycles – that’s where our duration difference comes from. The Wrap-up The obvious point here is that adding new components to a dataflow isn’t necessarily going to make it go any slower, moreover you may be able to achieve significant improvements by splitting logic over multiple components rather than one. Performance tuning is all about reducing the amount of work that needs to be done and that doesn’t necessarily mean use less components, indeed sometimes you may be able to reduce workload in ways that aren’t immediately obvious as I think I have proven here. Of course there are many variables in play here and your mileage will most definitely vary. I encourage you to download the package and see if you get similar results – let me know in the comments. The package contains all three dataflows plus a fourth dataflow that will create the 2GB raw file for you (you will also need the [AdventureWorksDW2008] sample database from which to source the data); simply disable all dataflows except the one you want to test before executing the package and remember, execute using dtexec, not within BIDS. If you want to explore dataflow performance tuning in more detail then here are some links you might want to check out: Inequality joins, Asynchronous transformations and Lookups Destination Adapter Comparison Don’t turn the dataflow into a cursor SSIS Dataflow – Designing for performance (webinar) Any comments? Let me know! @Jamiet

Read the article
July, the 31 Days of SQL Server DMO’s – Day 27 (sys.dm_db_file_space_usage)

- by Tamarick Hill

The sys.dm_db_file_space usage DMV returns information about database file space usage. This DMV was enhanced for the 2012 version to include 3 additional columns. Let’s query this DMV against our AdventureWorks2012 database and view the results. SELECT * FROM sys.dm_db_file_space_usage The column returned from this DMV are really self-explanatory, but I will give you a description, paraphrased from books online, below. The first three columns returned from this DMV represent the Database, File, and Filegroup for the current database context that executed the DMV query. The next column is the total_page_count which represents the total number of pages in the file. The allocated_extent_page_count represents the total number of pages in all extents that have been allocated. The unallocated_extent_page_count represents the number of pages in the unallocated extents within the file. The version_store_reserved_page_count column represents the number of pages that are allocated to the version store. The user_object_reserved_page_count represents the number of pages allocated for user objects. The internal_object_reserved_page_count represents the number of pages allocated for internal objects. Lastly is the mixed_extent_page_count which represents the total number of pages that are part of mixed extents. This is a great DMV for retrieving usage space information from your database files. For more information about this DMV, please see the below Books Online link: http://msdn.microsoft.com/en-us/library/ms174412.aspx Follow me on Twitter @PrimeTimeDBA

Read the article
Developers are strange

- by DavidWimbush

Why do developers always use the GUI tools in SQL Server? I've always found this irritating and just vaguely assumed it's because they aren't familiar with SQL syntax. But when you think about it it, it's a genuine puzzle. Developers type code all day - really heavy code too like generics, lamda functions and extension methods. They (thankfully) scorn the Visual Studio stuff where you drag a table onto the class and it pastes in lots of code to query the table into a DataSet or something. But when they want to add a column to a table, without fail they dive into the graphical table designer. And half the time the script it generates does horrible things like copy the table to another one with the new column, delete the old table, and rename the new table. Which is fine if your users don't care about uptime. Is ALTER TABLE ADD <column definition> really that hard? I just don't get it.

Read the article
Sesame Data Browser: filtering, sorting, selecting and linking

- by Fabrice Marguerie

I have deferred the post about how Sesame is built in favor of publishing a new update.This new release offers major features such as the ability to quickly filter and sort data, select columns, and create hyperlinks to OData. Filtering, sorting, selecting In order to filter data, you just have to use the filter row, which becomes available when you click on the funnel button: You can then type some text and select an operator: The data grid will be refreshed immediately after you apply a filter. It works in the same way for sorting. Clicking on a column will immediately update the query and refresh the grid.Note that multi-column sorting is possible by using SHIFT-click: Viewing data is not enough. You can also view and copy the query string that returns that data: One more thing you can to shape data is to select which columns are displayed. Simply use the Column Chooser and you'll be done: Again, this will update the data and query string in real time: Linking to Sesame, linking to OData The other main feature of this release is the ability to create hyperlinks to Sesame. That's right, you can ask Sesame to give you a link you can display on a webpage, send in an email, or type in a chat session. You can get a link to a connection: or to a query: You'll note that you can also decide to embed Sesame in a webpage... Here are some sample links created via Sesame: Netflix movies with high ratings, sorted by release year Netflix horror movies from the 21st century Northwind discontinued products with remaining stock Netflix empty connection I'll give more examples in a post to follow. There are many more minor improvements in this release, but I'll let you find out about them by yourself :-)Please try Sesame Data Browser now and let me know what you think! PS: if you use Sesame from the desktop, please use the "Remove this application" command in the context menu of the destkop app and then "Install on desktop" again in your web browser. I'll activate automatic updates with the next release.

Read the article
quick look at: dm_db_index_physical_stats

- by fatherjack

A quick look at the key data from this dmv that can help a DBA keep databases performing well and systems online as the users need them. When the dynamic management views relating to index statistics became available in SQL Server 2005 there was much hype about how they can help a DBA keep their servers running in better health than ever before. This particular view gives an insight into the physical health of the indexes present in a database. Whether they are use or unused, complete or missing some columns is irrelevant, this is simply the physical stats of all indexes; disabled indexes are ignored however. In it’s simplest form this dmv can be executed as: The results from executing this contain a record for every index in every database but some of the columns will be NULL. The first parameter is there so that you can specify which database you want to gather index details on, rather than scan every database. Simply specifying DB_ID() in place of the first NULL achieves this. In order to avoid the NULLS, or more accurately, in order to choose when to have the NULLS you need to specify a value for the last parameter. It takes one of 4 values – DEFAULT, ‘SAMPLED’, ‘LIMITED’ or ‘DETAILED’. If you execute the dmv with each of these values you can see some interesting details in the times taken to complete each step. DECLARE @Start DATETIME DECLARE @First DATETIME DECLARE @Second DATETIME DECLARE @Third DATETIME DECLARE @Finish DATETIME SET @Start = GETDATE() SELECT * FROM [sys].[dm_db_index_physical_stats](DB_ID(), NULL, NULL, NULL, DEFAULT) AS ddips SET @First = GETDATE() SELECT * FROM [sys].[dm_db_index_physical_stats](DB_ID(), NULL, NULL, NULL, 'SAMPLED') AS ddips SET @Second = GETDATE() SELECT * FROM [sys].[dm_db_index_physical_stats](DB_ID(), NULL, NULL, NULL, 'LIMITED') AS ddips SET @Third = GETDATE() SELECT * FROM [sys].[dm_db_index_physical_stats](DB_ID(), NULL, NULL, NULL, 'DETAILED') AS ddips SET @Finish = GETDATE() SELECT DATEDIFF(ms, @Start, @First) AS [DEFAULT] , DATEDIFF(ms, @First, @Second) AS [SAMPLED] , DATEDIFF(ms, @Second, @Third) AS [LIMITED] , DATEDIFF(ms, @Third, @Finish) AS [DETAILED] Running this code will give you 4 result sets; DEFAULT will have 12 columns full of data and then NULLS in the remainder. SAMPLED will have 21 columns full of data. LIMITED will have 12 columns of data and the NULLS in the remainder. DETAILED will have 21 columns full of data. So, from this we can deduce that the DEFAULT value (the same one that is also applied when you query the view using a NULL parameter) is the same as using LIMITED. Viewing the final result set has some details that are worth noting: Running queries against this view takes significantly longer when using the SAMPLED and DETAILED values in the last parameter. The duration of the query is directly related to the size of the database you are working in so be careful running this on big databases unless you have tried it on a test server first. Let’s look at the data we get back with the DEFAULT value first of all and then progress to the extra information later. We know that the first parameter that we supply has to be a database id and for the purposes of this blog we will be providing that value with the DB_ID function. We could just as easily put a fixed value in there or a function such as DB_ID (‘AnyDatabaseName’). The first columns we get back are database_id and object_id. These are pretty explanatory and we can wrap those in some code to make things a little easier to read: SELECT DB_NAME([ddips].[database_id]) AS [DatabaseName] , OBJECT_NAME([ddips].[object_id]) AS [TableName] … FROM [sys].[dm_db_index_physical_stats](DB_ID(), NULL, NULL, NULL, NULL) AS ddips gives us SELECT DB_NAME([ddips].[database_id]) AS [DatabaseName] , OBJECT_NAME([ddips].[object_id]) AS [TableName], [i].[name] AS [IndexName] , ….. FROM [sys].[dm_db_index_physical_stats](DB_ID(), NULL, NULL, NULL, NULL) AS ddips INNER JOIN [sys].[indexes] AS i ON [ddips].[index_id] = [i].[index_id] AND [ddips].[object_id] = [i].[object_id] These handily tie in with the next parameters in the query on the dmv. If you specify an object_id and an index_id in these then you get results limited to either the table or the specific index. Once again we can place a function in here to make it easier to work with a specific table. eg. SELECT * FROM [sys].[dm_db_index_physical_stats] (DB_ID(), OBJECT_ID(‘AdventureWorks2008.Person.Address’) , 1, NULL, NULL) AS ddips Note: Despite me showing that functions can be placed directly in the parameters for this dmv, best practice recommends that functions are not used directly in the function as it is possible that they will fail to return a valid object ID. To be certain of not passing invalid values to this function, and therefore setting an automated process off on the wrong path, declare variables for the OBJECT_IDs and once they have been validated, use them in the function: DECLARE @db_id SMALLINT; DECLARE @object_id INT; SET @db_id = DB_ID(N’AdventureWorks_2008′); SET @object_id = OBJECT_ID(N’AdventureWorks_2008.Person.Address’); IF @db_id IS NULL BEGINPRINT N’Invalid database’; ENDELSE IF @object_id IS NULL BEGINPRINT N’Invalid object’; ENDELSE BEGINSELECT * FROM sys.dm_db_index_physical_stats (@db_id, @object_id, NULL, NULL , ‘LIMITED’); END; GO In cases where the results of querying this dmv don’t have any effect on other processes (i.e. simply viewing the results in the SSMS results area) then it will be noticed when the results are not consistent with the expected results and in the case of this blog this is the method I have used. So, now we can relate the values in these columns to something that we recognise in the database lets see what those other values in the dmv are all about. The next columns are: We’ll skip partition_number, index_type_desc, alloc_unit_type_desc, index_depth and index_level as this is a quick look at the dmv and they are pretty self explanatory. The final columns revealed by querying this view in the DEFAULT mode are avg_fragmentation_in_percent. This is the amount that the index is logically fragmented. It will show NULL when the dmv is queried in SAMPLED mode. fragment_count. The number of pieces that the index is broken into. It will show NULL when the dmv is queried in SAMPLED mode. avg_fragment_size_in_pages. The average size, in pages, of a single fragment in the leaf level of the IN_ROW_DATA allocation unit. It will show NULL when the dmv is queried in SAMPLED mode. page_count. Total number of index or data pages in use. OK, so what does this give us? Well, there is an obvious correlation between fragment_count, page_count and avg_fragment_size-in_pages. We see that an index that takes up 27 pages and is in 3 fragments has an average fragment size of 9 pages (27/3=9). This means that for this index there are 3 separate places on the hard disk that SQL Server needs to locate and access to gather the data when it is requested by a DML query. If this index was bigger than 72KB then having it’s data in 3 pieces might not be too big an issue as each piece would have a significant piece of data to read and the speed of access would not be too poor. If the number of fragments increases then obviously the amount of data in each piece decreases and that means the amount of work for the disks to do in order to retrieve the data to satisfy the query increases and this would start to decrease performance. This information can be useful to keep in mind when considering the value in the avg_fragmentation_in_percent column. This is arrived at by an internal algorithm that gives a value to the logical fragmentation of the index taking into account the multiple files, type of allocation unit and the previously mentioned characteristics if index size (page_count) and fragment_count. Seeing an index with a high avg_fragmentation_in_percent value will be a call to action for a DBA that is investigating performance issues. It is possible that tables will have indexes that suffer from rapid increases in fragmentation as part of normal daily business and that regular defragmentation work will be needed to keep it in good order. In other cases indexes will rarely become fragmented and therefore not need rebuilding from one end of the year to another. Keeping this in mind DBAs need to use an ‘intelligent’ process that assesses key characteristics of an index and decides on the best, if any, defragmentation method to apply should be used. There is a simple example of this in the sample code found in the Books OnLine content for this dmv, in example D. There are also a couple of very popular solutions created by SQL Server MVPs Michelle Ufford and Ola Hallengren which I would wholly recommend that you review for much further detail on how to care for your SQL Server indexes. Right, let’s get back on track then. Querying the dmv with the fifth parameter value as ‘DETAILED’ takes longer because it goes through the index and refreshes all data from every level of the index. As this blog is only a quick look a we are going to skate right past ghost_record_count and version_ghost_record_count and discuss avg_page_space_used_in_percent, record_count, min_record_size_in_bytes, max_record_size_in_bytes and avg_record_size_in_bytes. We can see from the details below that there is a correlation between the columns marked. Column 1 (Page_Count) is the number of 8KB pages used by the index, column 2 is how full each page is (how much of the 8KB has actual data written on it), column 3 is how many records are recorded in the index and column 4 is the average size of each record. This approximates to: ((Col1*8) * 1024*(Col2/100))/Col3 = Col4*. avg_page_space_used_in_percent is an important column to review as this indicates how much of the disk that has been given over to the storage of the index actually has data on it. This value is affected by the value given for the FILL_FACTOR parameter when creating an index. avg_record_size_in_bytes is important as you can use it to get an idea of how many records are in each page and therefore in each fragment, thus reinforcing how important it is to keep fragmentation under control. min_record_size_in_bytes and max_record_size_in_bytes are exactly as their names set them out to be. A detail of the smallest and largest records in the index. Purely offered as a guide to the DBA to better understand the storage practices taking place. So, keeping an eye on avg_fragmentation_in_percent will ensure that your indexes are helping data access processes take place as efficiently as possible. Where fragmentation recurs frequently then potentially the DBA should consider; the fill_factor of the index in order to leave space at the leaf level so that new records can be inserted without causing fragmentation so rapidly. the columns used in the index should be analysed to avoid new records needing to be inserted in the middle of the index but rather always be added to the end. * – it’s approximate as there are many factors associated with things like the type of data and other database settings that affect this slightly. Another great resource for working with SQL Server DMVs is Performance Tuning with SQL Server Dynamic Management Views by Louis Davidson and Tim Ford – a free ebook or paperback from Simple Talk. Disclaimer – Jonathan is a Friend of Red Gate and as such, whenever they are discussed, will have a generally positive disposition towards Red Gate tools. Other tools are often available and you should always try others before you come back and buy the Red Gate ones. All code in this blog is provided “as is” and no guarantee, warranty or accuracy is applicable or inferred, run the code on a test server and be sure to understand it before you run it on a server that means a lot to you or your manager.

Read the article
Convert ddply {plyr} to Oracle R Enterprise, or use with Embedded R Execution

- by Mark Hornick

The plyr package contains a set of tools for partitioning a problem into smaller sub-problems that can be more easily processed. One function within {plyr} is ddply, which allows you to specify subsets of a data.frame and then apply a function to each subset. The result is gathered into a single data.frame. Such a capability is very convenient. The function ddply also has a parallel option that if TRUE, will apply the function in parallel, using the backend provided by foreach. This type of functionality is available through Oracle R Enterprise using the ore.groupApply function. In this blog post, we show a few examples from Sean Anderson's "A quick introduction to plyr" to illustrate the correpsonding functionality using ore.groupApply. To get started, we'll create a demo data set and load the plyr package. set.seed(1) d <- data.frame(year = rep(2000:2014, each = 3), count = round(runif(45, 0, 20))) dim(d) library(plyr) This first example takes the data frame, partitions it by year, and calculates the coefficient of variation of the count, returning a data frame. # Example 1 res <- ddply(d, "year", function(x) { mean.count <- mean(x$count) sd.count <- sd(x$count) cv <- sd.count/mean.count data.frame(cv.count = cv) }) To illustrate the equivalent functionality in Oracle R Enterprise, using embedded R execution, we use the ore.groupApply function on the same data, but pushed to the database, creating an ore.frame. The function ore.push creates a temporary table in the database, returning a proxy object, the ore.frame. D <- ore.push(d) res <- ore.groupApply (D, D$year, function(x) { mean.count <- mean(x$count) sd.count <- sd(x$count) cv <- sd.count/mean.count data.frame(year=x$year[1], cv.count = cv) }, FUN.VALUE=data.frame(year=1, cv.count=1)) You'll notice the similarities in the first three arguments. With ore.groupApply, we augment the function to return the specific data.frame we want. We also specify the argument FUN.VALUE, which describes the resulting data.frame. From our previous blog posts, you may recall that by default, ore.groupApply returns an ore.list containing the results of each function invocation. To get a data.frame, we specify the structure of the result. The results in both cases are the same, however the ore.groupApply result is an ore.frame. In this case the data stays in the database until it's actually required. This can result in significant memory and time savings whe data is large. R> class(res) [1] "ore.frame" attr(,"package") [1] "OREbase" R> head(res) year cv.count 1 2000 0.3984848 2 2001 0.6062178 3 2002 0.2309401 4 2003 0.5773503 5 2004 0.3069680 6 2005 0.3431743 To make the ore.groupApply execute in parallel, you can specify the argument parallel with either TRUE, to use default database parallelism, or to a specific number, which serves as a hint to the database as to how many parallel R engines should be used. The next ddply example uses the summarise function, which creates a new data.frame. In ore.groupApply, the year column is passed in with the data. Since no automatic creation of columns takes place, we explicitly set the year column in the data.frame result to the value of the first row, since all rows received by the function have the same year. # Example 2 ddply(d, "year", summarise, mean.count = mean(count)) res <- ore.groupApply (D, D$year, function(x) { mean.count <- mean(x$count) data.frame(year=x$year[1], mean.count = mean.count) }, FUN.VALUE=data.frame(year=1, mean.count=1)) R> head(res) year mean.count 1 2000 7.666667 2 2001 13.333333 3 2002 15.000000 4 2003 3.000000 5 2004 12.333333 6 2005 14.666667 Example 3 uses the transform function with ddply, which modifies the existing data.frame. With ore.groupApply, we again construct the data.frame explicilty, which is returned as an ore.frame. # Example 3 ddply(d, "year", transform, total.count = sum(count)) res <- ore.groupApply (D, D$year, function(x) { total.count <- sum(x$count) data.frame(year=x$year[1], count=x$count, total.count = total.count) }, FUN.VALUE=data.frame(year=1, count=1, total.count=1)) > head(res) year count total.count 1 2000 5 23 2 2000 7 23 3 2000 11 23 4 2001 18 40 5 2001 4 40 6 2001 18 40 In Example 4, the mutate function with ddply enables you to define new columns that build on columns just defined. Since the construction of the data.frame using ore.groupApply is explicit, you always have complete control over when and how to use columns. # Example 4 ddply(d, "year", mutate, mu = mean(count), sigma = sd(count), cv = sigma/mu) res <- ore.groupApply (D, D$year, function(x) { mu <- mean(x$count) sigma <- sd(x$count) cv <- sigma/mu data.frame(year=x$year[1], count=x$count, mu=mu, sigma=sigma, cv=cv) }, FUN.VALUE=data.frame(year=1, count=1, mu=1,sigma=1,cv=1)) R> head(res) year count mu sigma cv 1 2000 5 7.666667 3.055050 0.3984848 2 2000 7 7.666667 3.055050 0.3984848 3 2000 11 7.666667 3.055050 0.3984848 4 2001 18 13.333333 8.082904 0.6062178 5 2001 4 13.333333 8.082904 0.6062178 6 2001 18 13.333333 8.082904 0.6062178 In Example 5, ddply is used to partition data on multiple columns before constructing the result. Realizing this with ore.groupApply involves creating an index column out of the concatenation of the columns used for partitioning. This example also allows us to illustrate using the ORE transparency layer to subset the data. # Example 5 baseball.dat <- subset(baseball, year > 2000) # data from the plyr package x <- ddply(baseball.dat, c("year", "team"), summarize, homeruns = sum(hr)) We first push the data set to the database to get an ore.frame. We then add the composite column and perform the subset, using the transparency layer. Since the results from database execution are unordered, we will explicitly sort these results and view the first 6 rows. BB.DAT <- ore.push(baseball) BB.DAT$index <- with(BB.DAT, paste(year, team, sep="+")) BB.DAT2 <- subset(BB.DAT, year > 2000) X <- ore.groupApply (BB.DAT2, BB.DAT2$index, function(x) { data.frame(year=x$year[1], team=x$team[1], homeruns=sum(x$hr)) }, FUN.VALUE=data.frame(year=1, team="A", homeruns=1), parallel=FALSE) res <- ore.sort(X, by=c("year","team")) R> head(res) year team homeruns 1 2001 ANA 4 2 2001 ARI 155 3 2001 ATL 63 4 2001 BAL 58 5 2001 BOS 77 6 2001 CHA 63 Our next example is derived from the ggplot function documentation. This illustrates the use of ddply within using the ggplot2 package. We first create a data.frame with demo data and use ddply to create some statistics for each group (gp). We then use ggplot to produce the graph. We can take this same code, push the data.frame df to the database and invoke this on the database server. The graph will be returned to the client window, as depicted below. # Example 6 with ggplot2 library(ggplot2) df <- data.frame(gp = factor(rep(letters[1:3], each = 10)), y = rnorm(30)) # Compute sample mean and standard deviation in each group library(plyr) ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y)) # Set up a skeleton ggplot object and add layers: ggplot() + geom_point(data = df, aes(x = gp, y = y)) + geom_point(data = ds, aes(x = gp, y = mean), colour = 'red', size = 3) + geom_errorbar(data = ds, aes(x = gp, y = mean, ymin = mean - sd, ymax = mean + sd), colour = 'red', width = 0.4) DF <- ore.push(df) ore.tableApply(DF, function(df) { library(ggplot2) library(plyr) ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y)) ggplot() + geom_point(data = df, aes(x = gp, y = y)) + geom_point(data = ds, aes(x = gp, y = mean), colour = 'red', size = 3) + geom_errorbar(data = ds, aes(x = gp, y = mean, ymin = mean - sd, ymax = mean + sd), colour = 'red', width = 0.4) }) But let's take this one step further. Suppose we wanted to produce multiple graphs, partitioned on some index column. We replicate the data three times and add some noise to the y values, just to make the graphs a little different. We also create an index column to form our three partitions. Note that we've also specified that this should be executed in parallel, allowing Oracle Database to control and manage the server-side R engines. The result of ore.groupApply is an ore.list that contains the three graphs. Each graph can be viewed by printing the list element. df2 <- rbind(df,df,df) df2$y <- df2$y + rnorm(nrow(df2)) df2$index <- c(rep(1,300), rep(2,300), rep(3,300)) DF2 <- ore.push(df2) res <- ore.groupApply(DF2, DF2$index, function(df) { df <- df[,1:2] library(ggplot2) library(plyr) ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y)) ggplot() + geom_point(data = df, aes(x = gp, y = y)) + geom_point(data = ds, aes(x = gp, y = mean), colour = 'red', size = 3) + geom_errorbar(data = ds, aes(x = gp, y = mean, ymin = mean - sd, ymax = mean + sd), colour = 'red', width = 0.4) }, parallel=TRUE) res[[1]] res[[2]] res[[3]] To recap, we've illustrated how various uses of ddply from the plyr package can be realized in ore.groupApply, which affords the user explicit control over the contents of the data.frame result in a straightforward manner. We've also highlighted how ddply can be used within an ore.groupApply call.

Read the article
Row Number Transformation

The Row Number Transformation calculates a row number for each row, and adds this as a new output column to the data flow. The column number is a sequential number, based on a seed value. Each row receives the next number in the sequence, based on the defined increment value. The final row number can be stored in a variable for later analysis, and can be used as part of a process to validate the integrity of the data movement. The Row Number transform has a variety of uses, such as generating surrogate keys, or as the basis for a data partitioning scheme when combined with the Conditional Split transformation. Properties Property Data Type Description Seed Int32 The first row number or seed value. Increment Int32 The value added to the previous row number to make the next row number. OutputVariable String The name of the variable into which the final row number is written post execution. (Optional). The three properties have been configured to support expressions, or they can set directly in the normal manner. Expressions on components are only visible on the hosting Data Flow task, not at the individual component level. Sometimes the data type of the property is incorrectly set when the properties are created, see the Troubleshooting section below for details on how to fix this. Installation The component is provided as an MSI file which you can download and run to install it. This simply places the files on disk in the correct locations and also installs the assemblies in the Global Assembly Cache as per Microsoft’s recommendations. You may need to restart the SQL Server Integration Services service, as this caches information about what components are installed, as well as restarting any open instances of Business Intelligence Development Studio (BIDS) / Visual Studio that you may be using to build your SSIS packages. For 2005/2008 Only - Finally you will have to add the transformation to the Visual Studio toolbox manually. Right-click the toolbox, and select Choose Items.... Select the SSIS Data Flow Items tab, and then check the Row Number transformation in the Choose Toolbox Items window. This process has been described in detail in the related FAQ entry for How do I install a task or transform component? We recommend you follow best practice and apply the current Microsoft SQL Server Service pack to your SQL Server servers and workstations, and this component requires a minimum of SQL Server 2005 Service Pack 1. Downloads The Row Number Transformation is available for SQL Server 2005, SQL Server 2008 (includes R2) and SQL Server 2012. Please choose the version to match your SQL Server version, or you can install multiple versions and use them side by side if you have more than one version of SQL Server installed. Row Number Transformation for SQL Server 2005 Row Number Transformation for SQL Server 2008 Row Number Transformation for SQL Server 2012 Version History SQL Server 2012 Version 3.0.0.6 - SQL Server 2012 release. Includes upgrade support for both 2005 and 2008 packages to 2012. (5 Jun 2012) SQL Server 2008 Version 2.0.0.5 - SQL Server 2008 release. (15 Oct 2008) SQL Server 2005 Version 1.2.0.34 – Updated installer. (25 Jun 2008) Version 1.2.0.7 - SQL Server 2005 RTM Refresh. SP1 Compatibility Testing. Added the ability to reuse an existing column to hold the generated row number, as an alternative to the default of adding a new column to the output. (18 Jun 2006) Version 1.2.0.7 - SQL Server 2005 RTM Refresh. SP1 Compatibility Testing. Added the ability to reuse an existing column to hold the generated row number, as an alternative to the default of adding a new column to the output. (18 Jun 2006) Version 1.0.0.0 - Public Release for SQL Server 2005 IDW 15 June CTP (29 Aug 2005) Screenshot Code Sample The following code sample demonstrates using the Data Generator Source and Row Number Transformation programmatically in a very simple package. Package package = new Package(); package.Name = "Data Generator & Row Number"; // Add the Data Flow Task Executable taskExecutable = package.Executables.Add("STOCK:PipelineTask"); // Get the task host wrapper, and the Data Flow task TaskHost taskHost = taskExecutable as TaskHost; MainPipe dataFlowTask = (MainPipe)taskHost.InnerObject; // Add Data Generator Source IDTSComponentMetaData100 componentSource = dataFlowTask.ComponentMetaDataCollection.New(); componentSource.Name = "Data Generator"; componentSource.ComponentClassID = "Konesans.Dts.Pipeline.DataGenerator.DataGenerator, Konesans.Dts.Pipeline.DataGenerator, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b2ab4a111192992b"; CManagedComponentWrapper instanceSource = componentSource.Instantiate(); instanceSource.ProvideComponentProperties(); instanceSource.SetComponentProperty("RowCount", 10000); // Add Row Number Tx IDTSComponentMetaData100 componentRowNumber = dataFlowTask.ComponentMetaDataCollection.New(); componentRowNumber.Name = "FlatFileDestination"; componentRowNumber.ComponentClassID = "Konesans.Dts.Pipeline.RowNumberTransform.RowNumberTransform, Konesans.Dts.Pipeline.RowNumberTransform, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b2ab4a111192992b"; CManagedComponentWrapper instanceRowNumber = componentRowNumber.Instantiate(); instanceRowNumber.ProvideComponentProperties(); instanceRowNumber.SetComponentProperty("Increment", 10); // Connect the two components together IDTSPath100 path = dataFlowTask.PathCollection.New(); path.AttachPathAndPropagateNotifications(componentSource.OutputCollection[0], componentRowNumber.InputCollection[0]); #if DEBUG // Save package to disk, DEBUG only new Application().SaveToXml(String.Format(@"C:\Temp\{0}.dtsx", package.Name), package, null); #endif package.Execute(); foreach (DtsError error in package.Errors) { Console.WriteLine("ErrorCode : {0}", error.ErrorCode); Console.WriteLine(" SubComponent : {0}", error.SubComponent); Console.WriteLine(" Description : {0}", error.Description); } package.Dispose(); Troubleshooting Make sure you have downloaded the version that matches your version of SQL Server. We offer separate downloads for SQL Server 2005, SQL Server 2008 and SQL Server 2012. If you get an error when you try and use the component along the lines of The component could not be added to the Data Flow task. Please verify that this component is properly installed. ... The data flow object "Konesans ..." is not installed correctly on this computer, this usually indicates that the internal cache of SSIS components needs to be updated. This is held by the SSIS service, so you need restart the the SQL Server Integration Services service. You can do this from the Services applet in Control Panel or Administrative Tools in Windows. You can also restart the computer if you prefer. You may also need to restart any current instances of Business Intelligence Development Studio (BIDS) / Visual Studio that you may be using to build your SSIS packages. Once installation is complete you need to manually add the task to the toolbox before you will see it and to be able add it to packages - How do I install a task or transform component? Please also make sure you have installed a minimum of SP1 for SQL 2005. The IDtsPipelineEnvironmentService was added in SQL Server 2005 Service Pack 1 (SP1) (See http://support.microsoft.com/kb/916940). If you get an error Could not load type 'Microsoft.SqlServer.Dts.Design.IDtsPipelineEnvironmentService' from assembly 'Microsoft.SqlServer.Dts.Design, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91'. when trying to open the user interface, it implies that your development machine has not had SP1 applied. Very occasionally we get a problem to do with the properties not being created with the correct data type. Since there is no way to programmatically to define the data type of a pipeline component property, it can only infer it. Whilst we set an integer value as we create the property, sometimes SSIS decides to define it is a decimal. This is often highlighted when you use a property expression against the property and get an error similar to Cannot convert System.Int32 to System.Decimal. Unfortunately this is beyond our control and there appears to be no pattern as to when this happens. If you do have more information we would be happy to hear it. To fix this issue you can manually edit the package file. In Visual Studio right click the package file from the Solution Explorer and select View Code, which will open the package as raw XML. You can now search for the properties by name or the component name. You can then change the incorrect property data types highlighted below from Decimal to Int32. <component id="37" name="Row Number Transformation" componentClassID="{BF01D463-7089-41EE-8F05-0A6DC17CE633}" … > <properties> <property id="38" name="UserComponentTypeName" …> <property id="41" name="Seed" dataType="System.Int32" ...>10</property> <property id="42" name="Increment" dataType="System.Decimal" ...>10</property> ... If you are still having issues then contact us, but please provide as much detail as possible about error, as well as which version of the the task you are using and details of the SSIS tools installed.

Read the article
SQL SERVER – Weekly Series – Memory Lane – #037

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Convert Text to Numbers (Integer) – CAST and CONVERT If table column is VARCHAR and has all the numeric values in it, it can be retrieved as Integer using CAST or CONVERT function. List All Stored Procedure Modified in Last N Days If SQL Server suddenly start behaving in un-expectable behavior and if stored procedure were changed recently, following script can be used to check recently modified stored procedure. If a stored procedure was created but never modified afterwards modified date and create a date for that stored procedure are same. Count Duplicate Records – Rows Validate Field For DATE datatype using function ISDATE() We always checked DATETIME field for incorrect data type. One of the user input date as 30/2/2007. The date was sucessfully inserted in the temp table but while inserting from temp table to final table it crashed with error. We had now task to validate incorrect date value before we insert in final table. Jr. Developer asked me how can he do that? We check for incorrect data type (varchar, int, NULL) but this is incorrect date value. Regular expression works fine with them because of mm/dd/yyyy format. 2008 Find Space Used For Any Particular Table It is very simple to find out the space used by any table in the database. Two Convenient Features Inline Assignment – Inline Operations Here is the script which does both – Inline Assignment and Inline Operation DECLARE @idx INT = 0 SET @idx+=1 SELECT @idx Introduction to SPARSE Columns SPARSE column are better at managing NULL and ZERO values in SQL Server. It does not take any space in database at all. If column is created with SPARSE clause with it and it contains ZERO or NULL it will be take lesser space then regular column (without SPARSE clause). SP_CONFIGURE – Displays or Changes Global Configuration Settings If advanced settings are not enabled at configuration level SQL Server will not let user change the advanced features on server. Authorized user can turn on or turn off advance settings. 2009 Standby Servers and Types of Standby Servers Standby Server is a type of server that can be brought online in a situation when Primary Server goes offline and application needs continuous (high) availability of the server. There is always a need to set up a mechanism where data and objects from primary server are moved to secondary (standby) server. BLOB – Pointer to Image, Image in Database, FILESTREAM Storage When it comes to storing images in database there are two common methods. I had previously blogged about the same subject on my visit to Toronto. With SQL Server 2008, we have a new method of FILESTREAM storage. However, the answer on when to use FILESTREAM and when to use other methods is still vague in community. 2010 Upper Case Shortcut SQL Server Management Studio I select the word and hit CTRL+SHIFT+U and it SSMS immediately changes the case of the selected word. Similar way if one want to convert cases to lower case, another short cut CTRL+SHIFT+L is also available. The Self Join – Inner Join and Outer Join Self Join has always been a noteworthy case. It is interesting to ask questions about self join in a room full of developers. I often ask – if there are three kinds of joins, i.e.- Inner Join, Outer Join and Cross Join; what type of join is Self Join? The usual answer is that it is an Inner Join. However, the reality is very different. Parallelism – Row per Processor – Row per Thread – Thread 0 If you look carefully in the Properties window or XML Plan, there is “Thread 0?. What does this “Thread 0” indicate? Well find out from the blog post. How do I Learn and How do I Teach The blog post has raised three very interesting questions. How do you learn? How do you teach? What are you learning or teaching? Let me try to answer the same. 2011 SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 7 of 31 What are Different Types of Locks? What are Pessimistic Lock and Optimistic Lock? When is the use of UPDATE_STATISTICS command? What is the Difference between a HAVING clause and a WHERE clause? What is Connection Pooling and why it is Used? What are the Properties and Different Types of Sub-Queries? What are the Authentication Modes in SQL Server? How can it be Changed? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 8 of 31 Which Command using Query Analyzer will give you the Version of SQL Server and Operating System? What is an SQL Server Agent? Can a Stored Procedure call itself or a Recursive Stored Procedure? How many levels of SP nesting is possible? What is Log Shipping? Name 3 ways to get an Accurate Count of the Number of Records in a Table? What does it mean to have QUOTED_IDENTIFIER ON? What are the Implications of having it OFF? What is the Difference between a Local and a Global Temporary Table? What is the STUFF Function and How Does it Differ from the REPLACE Function? What is PRIMARY KEY? What is UNIQUE KEY Constraint? What is FOREIGN KEY? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 9 of 31 What is CHECK Constraint? What is NOT NULL Constraint? What is the difference between UNION and UNION ALL? What is B-Tree? How to get @@ERROR and @@ROWCOUNT at the Same Time? What is a Scheduled Job or What is a Scheduled Task? What are the Advantages of Using Stored Procedures? What is a Table Called, if it has neither Cluster nor Non-cluster Index? What is it Used for? Can SQL Servers Linked to other Servers like Oracle? What is BCP? When is it Used? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 10 of 31 What Command do we Use to Rename a db, a Table and a Column? What are sp_configure Commands and SET Commands? How to Implement One-to-One, One-to-Many and Many-to-Many Relationships while Designing Tables? What is Difference between Commit and Rollback when Used in Transactions? What is an Execution Plan? When would you Use it? How would you View the Execution Plan? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 11 of 31 What is Difference between Table Aliases and Column Aliases? Do they Affect Performance? What is the difference between CHAR and VARCHAR Datatypes? What is the Difference between VARCHAR and VARCHAR(MAX) Datatypes? What is the Difference between VARCHAR and NVARCHAR datatypes? Which are the Important Points to Note when Multilanguage Data is Stored in a Table? How to Optimize Stored Procedure Optimization? What is SQL Injection? How to Protect Against SQL Injection Attack? How to Find Out the List Schema Name and Table Name for the Database? What is CHECKPOINT Process in the SQL Server? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 12 of 31 How does Using a Separate Hard Drive for Several Database Objects Improves Performance Right Away? How to Find the List of Fixed Hard Drive and Free Space on Server? Why can there be only one Clustered Index and not more than one? What is Difference between Line Feed (\n) and Carriage Return (\r)? Is It Possible to have Clustered Index on Separate Drive From Original Table Location? What is a Hint? How to Delete Duplicate Rows? Why the Trigger Fires Multiple Times in Single Login? 2012 CTRL+SHIFT+] Shortcut to Select Code Between Two Parenthesis Shortcut key is CTRL+SHIFT+]. This key can be very useful when dealing with multiple subqueries, CTE or query with multiple parentheses. When exercised this shortcut key it selects T-SQL code between two parentheses. Monday Morning Puzzle – Query Returns Results Sometimes but Not Always I am beginner with SQL Server. I have one query, it sometime returns a result and sometime it does not return me the result. Where should I start looking for a solution and what kind of information I should send to you so you can help me with solving. I have no clue, please guide me. Remove Debug Button in SSMS – SQL in Sixty Seconds #020 – Video Effect of Case Sensitive Collation on Resultset Collation is a very interesting concept but I quite often see it is heavily neglected. I have seen developer and DBA looking for a workaround to fix collation error rather than understanding if the side effect of the workaround. Switch Between Two Parenthesis using Shortcut CTRL+] Earlier this week I wrote a blog post about CTRL+SHIFT+] Shortcut to Select Code Between Two Parenthesis, I received quite a lot of positive feedback from readers. If you are a regular reader of the blog post, you must be aware that I appreciate the learning shared by readers. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Entity Object Based on PL/SQL

- by Manoj Madhusoodanan

This blog describes how to create a PL/SQL based Entity Object.Oracle application has number of APIs and each API will perform numerous number of tasks.We can create PL/SQL based EO which will directly invoke the PL/SQL stored procedure from the EO. Here I am demonstrating using a standard API FND_USER_PKG.CREATEUSER.This API has x_user_name and x_owner as mandatory parameter.My task is to create a user through OAF page which will accept User Name and Password. Following steps needs to be performed to achieve the above scenario. 1) Create FndUserEO as follows Include all the API parameters and WHO columns in the EO. Make UserName and EncryptedUserPassword ( Here I am not using Encrypted Password. The column name is same as table column so I am keeping the same) column as mandatory. Generate VO. 2) Edit FndUserEOImpl and add the following 3) Attach FndUserVO to AM 4) Create the UI 5) Deploy following files to middle tier and restart the server. Entity Object: xxcust.oracle.apps.fnd.user.schema.server.FndUserEO.xml xxcust.oracle.apps.fnd.user.schema.server.FndUserEOImpl.java View Object: xxcust.oracle.apps.fnd.user.server.FndUserVO.xml xxcust.oracle.apps.fnd.user.server.FndUserVOImpl.javaUser Interface: xxcust.oracle.apps.fnd.user.webui.CreateFndUserCO.java xxcust.oracle.apps.fnd.user.webui.CreateFndUserPG.xmlYou can test by giving User Name and Password.

Read the article
PostgreSQL, Ubuntu, NetBeans IDE (Part 3)

- by Geertjan

To complete the picture, let's use the traditional (that is, old) Hibernate mechanism, i.e., via XML files, rather than via the annotations shown yesterday. It's definitely trickier, with many more places where typos can occur, but that's why it's the old mechanism. I do not recommend this approach. I recommend the approach shown yesterday. The other players in this scenario include PostgreSQL, as outlined in the previous blog entries in this series. Here's the structure of the module, replacing the code shown yesterday: Here's the Employee class, notice that it has no annotations: import java.io.Serializable; import java.util.Date; public class Employees implements Serializable { private int employeeId; private String firstName; private String lastName; private Date dateOfBirth; private String phoneNumber; private String junk; public int getEmployeeId() { return employeeId; } public void setEmployeeId(int employeeId) { this.employeeId = employeeId; } public String getFirstName() { return firstName; } public void setFirstName(String firstName) { this.firstName = firstName; } public String getLastName() { return lastName; } public void setLastName(String lastName) { this.lastName = lastName; } public Date getDateOfBirth() { return dateOfBirth; } public void setDateOfBirth(Date dateOfBirth) { this.dateOfBirth = dateOfBirth; } public String getPhoneNumber() { return phoneNumber; } public void setPhoneNumber(String phoneNumber) { this.phoneNumber = phoneNumber; } public String getJunk() { return junk; } public void setJunk(String junk) { this.junk = junk; } } And here's the Hibernate configuration file: <?xml version="1.0"?> <!DOCTYPE hibernate-configuration PUBLIC "-//Hibernate/Hibernate Configuration DTD 3.0//EN" "http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd"> <hibernate-configuration> <session-factory> <property name="hibernate.connection.driver_class">org.postgresql.Driver</property> <property name="hibernate.connection.url">jdbc:postgresql://localhost:5432/smithdb</property> <property name="hibernate.connection.username">smith</property> <property name="hibernate.connection.password">smith</property> <property name="hibernate.connection.pool_size">1</property> <property name="hibernate.default_schema">public"</property> <property name="hibernate.transaction.factory_class">org.hibernate.transaction.JDBCTransactionFactory</property> <property name="hibernate.current_session_context_class">thread</property> <property name="hibernate.dialect">org.hibernate.dialect.PostgreSQLDialect</property> <property name="hibernate.show_sql">true</property> <mapping resource="org/db/viewer/employees.hbm.xml"/> </session-factory> </hibernate-configuration> Next, the Hibernate mapping file: <?xml version="1.0"?> <!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 3.0//EN" "http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd"> <hibernate-mapping> <class name="org.db.viewer.Employees" table="employees" schema="public" catalog="smithdb"> <id name="employeeId" column="employee_id" type="int"> <generator class="increment"/> </id> <property name="firstName" column="first_name" type="string" /> <property name="lastName" column="last_name" type="string" /> <property name="dateOfBirth" column="date_of_birth" type="date" /> <property name="phoneNumber" column="phone_number" type="string" /> <property name="junk" column="junk" type="string" /> </class> </hibernate-mapping> Then, the HibernateUtil file, for providing access to the Hibernate SessionFactory: import java.net.URL; import org.hibernate.cfg.AnnotationConfiguration; import org.hibernate.SessionFactory; public class HibernateUtil { private static final SessionFactory sessionFactory; static { try { // Create the SessionFactory from standard (hibernate.cfg.xml) // config file. String res = "org/db/viewer/employees.cfg.xml"; URL myURL = Thread.currentThread().getContextClassLoader().getResource(res); sessionFactory = new AnnotationConfiguration().configure(myURL).buildSessionFactory(); } catch (Throwable ex) { // Log the exception. System.err.println("Initial SessionFactory creation failed." + ex); throw new ExceptionInInitializerError(ex); } } public static SessionFactory getSessionFactory() { return sessionFactory; } } Finally, the "createKeys" in the ChildFactory: @Override protected boolean createKeys(List list) { Session session = HibernateUtil.getSessionFactory().getCurrentSession(); Transaction transac = null; try { transac = session.beginTransaction(); Query query = session.createQuery("from Employees"); list.addAll(query.list()); } catch (HibernateException he) { Exceptions.printStackTrace(he); if (transac != null){ transac.rollback(); } } finally { session.close(); } return true; } Note that Constantine Drabo has a similar article here. Run the application and the result should be the same as yesterday.

Read the article
External table and preprocessor for loading LOBs

- by David Allan

I was using the COLUMN TRANSFORMS syntax to load LOBs into Oracle using the Oracle external which is a handy way of doing several stuff - from loading LOBs from the filesystem to having constants as fields. In OWB you can use unbound external tables to define an external table using your own arbitrary access parameters - I blogged a while back on this for doing preprocessing before it was added into OWB 11gR2. For loading LOBs using the COLUMN TRANSFORMS syntax have a read through this post on loading CLOB, BLOB or any LOB, the files to load can be specified as a field that is a filename field, the content of this file will be the LOB data. So using the example from the linked post, you can define the columns; Then define the access parameters - if you go the unbound external table route you can can put whatever you want in here (your external table get out of jail free card); This will let you read the LOB files fromn the filesystem and use the external table in a mapping. Pushing the envelope a little further I then thought about marrying together the preprocessor with the COLUMN TRANSFORMS, this would have let me have a shell script for example as the preprocessor which listed the contents of a directory and let me read the files as LOBs via an external table. Unfortunately that doesn't quote work - there is now a bug/enhancement logged, so one day maybe. So I'm afraid my blog title was a little bit of a teaser....

Read the article
Building Queries Systematically

- by Jeremy Smyth

The SQL language is a bit like a toolkit for data. It consists of lots of little fiddly bits of syntax that, taken together, allow you to build complex edifices and return powerful results. For the uninitiated, the many tools can be quite confusing, and it's sometimes difficult to decide how to go about the process of building non-trivial queries, that is, queries that are more than a simple SELECT a, b FROM c; A System for Building Queries When you're building queries, you could use a system like the following: Decide which fields contain the values you want to use in our output, and how you wish to alias those fields Values you want to see in your output Values you want to use in calculations . For example, to calculate margin on a product, you could calculate price - cost and give it the alias margin. Values you want to filter with. For example, you might only want to see products that weigh more than 2Kg or that are blue. The weight or colour columns could contain that information. Values you want to order by. For example you might want the most expensive products first, and the least last. You could use the price column in descending order to achieve that. Assuming the fields you've picked in point 1 are in multiple tables, find the connections between those tables Look for relationships between tables and identify the columns that implement those relationships. For example, The Orders table could have a CustomerID field referencing the same column in the Customers table. Sometimes the problem doesn't use relationships but rests on a different field; sometimes the query is looking for a coincidence of fact rather than a foreign key constraint. For example you might have sales representatives who live in the same state as a customer; this information is normally not used in relationships, but if your query is for organizing events where sales representatives meet customers, it's useful in that query. In such a case you would record the names of columns at either end of such a connection. Sometimes relationships require a bridge, a junction table that wasn't identified in point 1 above but is needed to connect tables you need; these are used in "many-to-many relationships". In these cases you need to record the columns in each table that connect to similar columns in other tables. Construct a join or series of joins using the fields and tables identified in point 2 above. This becomes your FROM clause. Filter using some of the fields in point 1 above. This becomes your WHERE clause. Construct an ORDER BY clause using values from point 1 above that are relevant to the desired order of the output rows. Project the result using the remainder of the fields in point 1 above. This becomes your SELECT clause. A Worked Example Let's say you want to query the world database to find a list of countries (with their capitals) and the change in GNP, using the difference between the GNP and GNPOld columns, and that you only want to see results for countries with a population greater than 100,000,000. Using the system described above, we could do the following: The Country.Name and City.Name columns contain the name of the country and city respectively. The change in GNP comes from the calculation GNP - GNPOld. Both those columns are in the Country table. This calculation is also used to order the output, in descending order To see only countries with a population greater than 100,000,000, you need the Population field of the Country table. There is also a Population field in the City table, so you'll need to specify the table name to disambiguate. You can also represent a number like 100 million as 100e6 instead of 100000000 to make it easier to read. Because the fields come from the Country and City tables, you'll need to join them. There are two relationships between these tables: Each city is hosted within a country, and the city's CountryCode column identifies that country. Also, each country has a capital city, whose ID is contained within the country's Capital column. This latter relationship is the one to use, so the relevant columns and the condition that uses them is represented by the following FROM clause: FROM Country JOIN City ON Country.Capital = City.ID The statement should only return countries with a population greater than 100,000,000. Country.Population is the relevant column, so the WHERE clause becomes: WHERE Country.Population > 100e6 To sort the result set in reverse order of difference in GNP, you could use either the calculation, or the position in the output (it's the third column): ORDER BY GNP - GNPOld or ORDER BY 3 Finally, project the columns you wish to see by constructing the SELECT clause: SELECT Country.Name AS Country, City.Name AS Capital, GNP - GNPOld AS `Difference in GNP` The whole statement ends up looking like this: mysql> SELECT Country.Name AS Country, City.Name AS Capital, -> GNP - GNPOld AS `Difference in GNP` -> FROM Country JOIN City ON Country.Capital = City.ID -> WHERE Country.Population > 100e6 -> ORDER BY 3 DESC; +--------------------+------------+-------------------+ | Country | Capital | Difference in GNP | +--------------------+------------+-------------------+ | United States | Washington | 399800.00 | | China | Peking | 64549.00 | | India | New Delhi | 16542.00 | | Nigeria | Abuja | 7084.00 | | Pakistan | Islamabad | 2740.00 | | Bangladesh | Dhaka | 886.00 | | Brazil | Brasília | -27369.00 | | Indonesia | Jakarta | -130020.00 | | Russian Federation | Moscow | -166381.00 | | Japan | Tokyo | -405596.00 | +--------------------+------------+-------------------+ 10 rows in set (0.00 sec) Queries with Aggregates and GROUP BY While this system might work well for many queries, it doesn't cater for situations where you have complex summaries and aggregation. For aggregation, you'd start with choosing which columns to view in the output, but this time you'd construct them as aggregate expressions. For example, you could look at the average population, or the count of distinct regions.You could also perform more complex aggregations, such as the average of GNP per head of population calculated as AVG(GNP/Population). Having chosen the values to appear in the output, you must choose how to aggregate those values. A useful way to think about this is that every aggregate query is of the form X, Y per Z. The SELECT clause contains the expressions for X and Y, as already described, and Z becomes your GROUP BY clause. Ordinarily you would also include Z in the query so you see how you are grouping, so the output becomes Z, X, Y per Z. As an example, consider the following, which shows a count of countries and the average population per continent: mysql> SELECT Continent, COUNT(Name), AVG(Population) -> FROM Country -> GROUP BY Continent; +---------------+-------------+-----------------+ | Continent | COUNT(Name) | AVG(Population) | +---------------+-------------+-----------------+ | Asia | 51 | 72647562.7451 | | Europe | 46 | 15871186.9565 | | North America | 37 | 13053864.8649 | | Africa | 58 | 13525431.0345 | | Oceania | 28 | 1085755.3571 | | Antarctica | 5 | 0.0000 | | South America | 14 | 24698571.4286 | +---------------+-------------+-----------------+ 7 rows in set (0.00 sec) In this case, X is the number of countries, Y is the average population, and Z is the continent. Of course, you could have more fields in the SELECT clause, and more fields in the GROUP BY clause as you require. You would also normally alias columns to make the output more suited to your requirements. More Complex Queries Queries can get considerably more interesting than this. You could also add joins and other expressions to your aggregate query, as in the earlier part of this post. You could have more complex conditions in the WHERE clause. Similarly, you could use queries such as these in subqueries of yet more complex super-queries. Each technique becomes another tool in your toolbox, until before you know it you're writing queries across 15 tables that take two pages to write out. But that's for another day...

Read the article
Silverlight 4 – Coded UI Framework Video Tutorial

- by mbcrump

With the release of Visual Studio 2010 Feature Pack 2, Microsoft included the Coded UI Test framework. With this release it is possible to create automated test with just a few mouse clicks. This is a very powerful feature that all Silverlight developers need to learn. Instead of my normal blog post, I have created a video tutorial that walks you through it starting from “File” –> New Project. I hope you enjoy and please leave feedback. Video Tutorial (short 9 minute video): Slides from the demo (only 3): Silverlight 4 – Coded UI Testing Code for the MainPage.xaml that was used in the Demo. For the sake of time, I did not go into the AutomationProperties.Name that I used for the TextBox or Button. I added that for each element . <Grid x:Name="LayoutRoot" Background="White" Height="100" Width="350"> <Grid.ColumnDefinitions> <ColumnDefinition/> <ColumnDefinition/> </Grid.ColumnDefinitions> <Grid.RowDefinitions> <RowDefinition/> <RowDefinition/> </Grid.RowDefinitions> <TextBlock Padding="15" Grid.Column="0" TextAlignment="Right">Name</TextBlock> <TextBox AutomationProperties.Name="txtAP" Grid.Column="1" Height="25" TextAlignment="Right" Name="txtName" /> <Button AutomationProperties.Name="btnAP" Grid.Row="1" Grid.Column="1" Content="Click for Name" x:Name="btnMessage" Click="btnMessage_Click" /> </Grid> Subscribe to my feed

Read the article
Where to put business logic in MVC design?

- by BriskLabs Pakistan

I have created a simple MVC java application that adds records through data forms to a database. my app collects data, it also validates it and stores it. This is because the data is being sourced online from different users. the data is mostly numeric in nature. now on the numeric data being stored into database (SQL server) , i wish that my app should be able to perform computations... and display it. the user is not interested in how computations are done so they must be encapsulated. the user must only be able to view the simple computed data which for example A column data - B Column data / C column data etc... and just display it to the user... i know how to write stored procedures for same but i want a 3 tier app I want the data, that I put into the database as a record, worked upon by performing calculations on it. However, the original data should remain unaffected, while the new data, post-calculations, must be stored as a new entity record into the database. Where should I write the code for this background calculation? As it is the rules and business logic... in a new java beans files ?

Read the article
Maximum Length Of IP Address: 15 (IPv4) & 39(IPv6)

- by Gopinath

Problem You are designing a database table for a web application that requires to store IP address of users who visits the site. The IP address is required to be stored a character data in the table. To define size of the character column you need to know maximum length of IP address. So, what is the maximum length of an IP address? Solution The IPv4 version of IP address is in the following format 255.255.255.255 To store IPv4 address we require 15 characters. The IPv6 version of IP address is grouped into sets of 4 hex digits separated by colons, like the below 2001:0db8:85a3:0000:0000:8a2e:0370:7334 To store IPv6 address you require a 39 characters long column. Conclusion As IPv4 and IPv6 are the commonly use protocols, you better define a column with 39 characters length so that both the format address are saved in to the table without any issues. This article titled,Maximum Length Of IP Address: 15 (IPv4) & 39(IPv6), was originally published at Tech Dreams. Grab our rss feed or fan us on Facebook to get updates from us.

Read the article
Using Python to traverse a parent-child data set

- by user132748

I have a dataset of two columns in a csv file. Th purpose of this dataset is to provide a linking between two different id's if they belong to the same person. e.g (2,3,5 belong to 1) e.g COLA COLB 1 2 ; 1 3 ; 1 5 ; 2 6 ; 3 7 ; 9 10 In the above example 1 is linked to 2,3,5 and 2 is the linked to 6 and 3 is linked to 7. What I am trying to achieve is to identify all records which are linked to 1 directly (2,3,5) or indirectly(6,7) and be able to say that these id's in column B belong to same person in column A and then either dedupe or add a new column to the output file which will have 1 populated for all rows that link to 1 e.g of expected output colA colB GroupField 1 2 1; 1 3 1; 1 5 1 ; 2 6 1 ;3 7 1; 9 10 9; 10 11 9 I am a newbie and so am not sure on how to approach this problem.Appreciate any inputs you'll can provide.

Read the article
Displaying a Paged Grid of Data in ASP.NET MVC

This article demonstrates how to display a paged grid of data in an ASP.NET MVC application and builds upon the work done in two earlier articles: Displaying a Grid of Data in ASP.NET MVC and Sorting a Grid of Data in ASP.NET MVC. Displaying a Grid of Data in ASP.NET MVC started with creating a new ASP.NET MVC application in Visual Studio, then added the Northwind database to the project and showed how to use Microsoft's Linq-to-SQL tool to access data from the database. The article then looked at creating a Controller and View for displaying a list of product information (the Model). Sorting a Grid of Data in ASP.NET MVC enhanced the application by adding a view-specific Model (ProductGridModel) that provided the View with the sorted collection of products to display along with sort-related information, such as the name of the database column the products were sorted by and whether the products were sorted in ascending or descending order. The Sorting a Grid of Data in ASP.NET MVC article also walked through creating a partial view to render the grid's header row so that each column header was a link that, when clicked, sorted the grid by that column. In this article we enhance the view-specific Model (ProductGridModel) to include paging-related information to include the current page being viewed, how many records to show per page, and how many total records are being paged through. Next, we create an action in the Controller that efficiently retrieves the appropriate subset of records to display and then complete the exercise by building a View that displays the subset of records and includes a paging interface that allows the user to step to the next or previous page, or to jump to a particular page number, we create and use a partial view that displays a numeric paging interface Like with its predecessors, this article offers step-by-step instructions and includes a complete, working demo available for download at the end of the article. Read on to learn more! Read More >

Read the article
Why is my query soooooo slow?

- by geekrutherford

A stored procedure used in our production environment recently became so slow it cause the calling web service to begin timing out. When running the stored procedure in Query Analyzer it took nearly 3 minutes to complete. The stored procedure itself does little more than create a small bit of dynamic SQL which calls a view with a where clause at the end. At first the thought was that the query used within the view needed to be optimized. The query is quite long and therefore easy to jump to this conclusion. Fortunately, after bringing the issue to the attention of a coworker they asked "is there a where clause, and if so, is there an index on the column(s) in it?" I had no idea and quickly said as much. A quick check on the table/column utilized in the where clause indicated indeed there was no index. Before adding the index, and after admitting I am no SQL wiz, I checked the internet for info on the difference between clustered and non-clustered indexes. I found the following site quite helpful OdeToCode. After adding the non-clustered index on the column, the query that used to take nearly 3 minutes now takes 10 seconds! Ah, if only I'd thought to do this ahead of time!

Read the article
T-SQL select where and group by date

- by bconlon

T-SQL has never been my favorite language, but I need to use it on a fairly regular basis and every time I seem to Google the same things. So if I add it here, it might help others with the same issues, but it will also save me time later as I will know where to look for the answers!! 1. How do I SELECT FROM WHERE to filter on a DateTime column? As it happens this is easy but I always forget. You just put the DATE value in single quotes and in standard format: SELECT StartDate FROM Customer WHERE StartDate >= '2011-01-01' ORDER BY StartDate 2. How do I then GROUP BY and get a count by StartDate? Bit trickier, but you can use the built in DATEADD and DATEDIFF to set the TIME part to midnight, allowing the GROUP BY to have a consistent value to work on: SELECT DATEADD (d, DATEDIFF(d, 0, StartDate),0) [Customer Creation Date], COUNT(*) [Number Of New Customers] FROM Customer WHERE StartDate >= '2011-01-01' GROUP BY DATEADD(d, DATEDIFF(d, 0, StartDate),0) ORDER BY [Customer Creation Date] Note: [Customer Creation Date] and [Number Of New Customers] column alias just provide more readable column headers. 3. Finally, how can you format the DATETIME to only show the DATE part (after all the TIME part is now always midnight)? The built in CONVERT function allows you to convert the DATETIME to a CHAR array using a specific format. The format is a bit arbitrary and needs looking up, but 101 is the U.S. standard mm/dd/yyyy, and 103 is the U.K. standard dd/mm/yyyy. SELECT CONVERT(CHAR(10), DATEADD(d, DATEDIFF(d, 0, StartDate),0), 103) [Customer Creation Date], COUNT(*) [Number Of New Customers] FROM Customer WHERE StartDate >= '2011-01-01' GROUP BY DATEADD(d, DATEDIFF(d, 0, StartDate),0) ORDER BY [Customer Creation Date] #

Read the article
OBIA on Teradata - Part 3 Stats

- by Mohan Ramanuja

Statements to run table stats on W_Party_Per_DS and W_Party_Per_DCOLLECT STATISTICS ON W_PARTY_PER_DS COLUMN ("DEPARTMENT_NAME");COLLECT STATISTICS ON W_PARTY_PER_DS COLUMN ("CONTACT_ID");COLLECT STATISTICS ON W_PARTY_PER_DS COLUMN ("CITY");COLLECT STATISTICS ON W_PARTY_PER_D COLUMN ("ACCNT_FLG");COLLECT STATISTICS ON W_PARTY_PER_D COLUMN ("SUPPLIER_FLG");help statistics w_party_per_d; Date Time Unique Values Column Names10/06/02 15:37:47 5,002,185 ROW_WID10/06/21 14:02:55 0 VIS_PR_POS_ID10/06/02 15:37:48 2 CREATED_BY_WID10/06/02 15:37:49 2 CHANGED_BY_WID10/06/02 15:37:50 2 SRC_EFF_FROM_DT10/06/02 15:37:51 1 SRC_EFF_TO_DT10/06/02 15:37:52 2 EFFECTIVE_FROM_DT10/06/02 15:37:53 2 EFFECTIVE_TO_DT10/06/02 15:37:57 1 DELETE_FLG10/06/21 14:02:54 0 CURRENT_FLG10/06/02 15:37:59 2 DATASOURCE_NUM_ID10/06/02 15:38:02 1 ETL_PROC_WID10/06/10 18:27:21 1,000 INTEGRATION_ID select top 10 * from DBC.TableSize; VprocDataBaseName AccountName TableName CurrentPerm PeakPerm 0 T21_ETL_TEMP_ENT IM IT/IM IT Enterprise region RZ_PENDD_FCLTY_CLM_STG 1024 0 0 SSB_RDS IM IT/IM IT ENTERPRISE REGION RDS_RESP_997_TLR 1024 0 0 T17_EDL IM IT/IM IT Enterprise region SPCMN_ACTN 1024 0 0 T20_ETL_CAPTR_DATA_ENT IM IT/IM IT Enterprise region HZ_CS90_VSGPNTE_S9MGNT14 2048 0 0 T5_ETL_DATA_PBM IM IT/IM IT Enterprise region PRCG_OVRD_BY_RX_NM 1536 0 0 PIP_DB $H&D&H PIPTRGENTSRC 1024 0 0 STest5_ADW0 sysadmin PROV_RGSTRTN 59904 0 0 AEDWSTG1 NEIM/NEIM MEMBERSHIP_LKUP_ETL 1024 0 0 AEDWTST5 dbc cptn_agrmt_xwlk 1024 0 0 VAL_LAG_TEMP $H1$&D&HDBA clm_lag_stg 347136 0 select vproc, CurrentPerm from DBC.TableSize where databasename = 'PRJ_CRM_STGC' and tablename='w_party_per_d' ORDER BY 2 DESC;Vproc DataBaseName AccountName TableName CurrentPerm PeakPerm0 PRJ_CRM_STGC DBA/DBA W_PARTY_PER_D 8704.00 841728.003 PRJ_CRM_STGC DBA/DBA W_PARTY_PER_D 8704.00 782848.00

Read the article

Search Results

Search found 10424 results on 417 pages for 'persisted column'.

Page 114/417 | < Previous Page | 110 111 112 113 114 115 116 117 118 119 120 121 | Next Page >

- by devnet247

- by pinaldave

- by pinaldave

- by arungupta

- by arungupta

- by jamiet

- by Tamarick Hill

- by DavidWimbush

- by Fabrice Marguerie

- by fatherjack

- by Mark Hornick

- by Pinal Dave

- by Manoj Madhusoodanan

- by Geertjan

- by David Allan

- by Jeremy Smyth

- by mbcrump

- by BriskLabs Pakistan

- by Gopinath

- by user132748

- by geekrutherford

- by bconlon

- by Mohan Ramanuja

< Previous Page | 110 111 112 113 114 115 116 117 118 119 120 121 | Next Page >