Search Results

Search found 754 results on 31 pages for 'aggregate'.

Page 28/31 | < Previous Page | 24 25 26 27 28 29 30 31  | Next Page >

  • SPARC T4-4 Delivers World Record Performance on Oracle OLAP Perf Version 2 Benchmark

    - by Brian
    Oracle's SPARC T4-4 server delivered world record performance with subsecond response time on the Oracle OLAP Perf Version 2 benchmark using Oracle Database 11g Release 2 running on Oracle Solaris 11. The SPARC T4-4 server achieved throughput of 430,000 cube-queries/hour with an average response time of 0.85 seconds and the median response time of 0.43 seconds. This was achieved by using only 60% of the available CPU resources leaving plenty of headroom for future growth. The SPARC T4-4 server operated on an Oracle OLAP cube with a 4 billion row fact table of sales data containing 4 dimensions. This represents as many as 90 quintillion aggregate rows (90 followed by 18 zeros). Performance Landscape Oracle OLAP Perf Version 2 Benchmark 4 Billion Fact Table Rows System Queries/hour Users* Response Time (sec) Average Median SPARC T4-4 430,000 7,300 0.85 0.43 * Users - the supported number of users with a given think time of 60 seconds Configuration Summary and Results Hardware Configuration: SPARC T4-4 server with 4 x SPARC T4 processors, 3.0 GHz 1 TB memory Data Storage 1 x Sun Fire X4275 (using COMSTAR) 2 x Sun Storage F5100 Flash Array (each with 80 FMODs) Redo Storage 1 x Sun Fire X4275 (using COMSTAR with 8 HDD) Software Configuration: Oracle Solaris 11 11/11 Oracle Database 11g Release 2 (11.2.0.3) with Oracle OLAP option Benchmark Description The Oracle OLAP Perf Version 2 benchmark is a workload designed to demonstrate and stress the Oracle OLAP product's core features of fast query, fast update, and rich calculations on a multi-dimensional model to support enhanced Data Warehousing. The bulk of the benchmark entails running a number of concurrent users, each issuing typical multidimensional queries against an Oracle OLAP cube consisting of a number of years of sales data with fully pre-computed aggregations. The cube has four dimensions: time, product, customer, and channel. Each query user issues approximately 150 different queries. One query chain may ask for total sales in a particular region (e.g South America) for a particular time period (e.g. Q4 of 2010) followed by additional queries which drill down into sales for individual countries (e.g. Chile, Peru, etc.) with further queries drilling down into individual stores, etc. Another query chain may ask for yearly comparisons of total sales for some product category (e.g. major household appliances) and then issue further queries drilling down into particular products (e.g. refrigerators, stoves. etc.), particular regions, particular customers, etc. Results from version 2 of the benchmark are not comparable with version 1. The primary difference is the type of queries along with the query mix. Key Points and Best Practices Since typical BI users are often likely to issue similar queries, with different constants in the where clauses, setting the init.ora prameter "cursor_sharing" to "force" will provide for additional query throughput and a larger number of potential users. Except for this setting, together with making full use of available memory, out of the box performance for the OLAP Perf workload should provide results similar to what is reported here. For a given number of query users with zero think time, the main measured metrics are the average query response time, the median query response time, and the query throughput. A derived metric is the maximum number of users the system can support achieving the measured response time assuming some non-zero think time. The calculation of the maximum number of users follows from the well-known response-time law N = (rt + tt) * tp where rt is the average response time, tt is the think time and tp is the measured throughput. Setting tt to 60 seconds, rt to 0.85 seconds and tp to 119.44 queries/sec (430,000 queries/hour), the above formula shows that the T4-4 server will support 7,300 concurrent users with a think time of 60 seconds and an average response time of 0.85 seconds. For more information see chapter 3 from the book "Quantitative System Performance" cited below. -- See Also Quantitative System Performance Computer System Analysis Using Queueing Network Models Edward D. Lazowska, John Zahorjan, G. Scott Graham, Kenneth C. Sevcik external local Oracle Database 11g – Oracle OLAP oracle.com OTN SPARC T4-4 Server oracle.com OTN Oracle Solaris oracle.com OTN Oracle Database 11g Release 2 oracle.com OTN Disclosure Statement Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 11/2/2012.

    Read the article

  • Using Stored Procedures in SSIS

    - by dataintegration
    The SSIS Data Flow components: the source task and the destination task are the easiest way to transfer data in SSIS. Some data transactions do not fit this model, they are procedural tasks modeled as stored procedures. In this article we show how you can call stored procedures available in RSSBus ADO.NET Providers from SSIS. In this article we will use the CreateJob and the CreateBatch stored procedures available in RSSBus ADO.NET Provider for Salesforce, but the same steps can be used to call a stored procedure in any of our data providers. Step 1: Open Visual Studio and create a new Integration Services Project. Step 2: Add a new Data Flow Task to the Control Flow window. Step 3: Open the Data Flow Task and add a Script Component to the data flow pane. A dialog box will pop-up allowing you to select the Script Component Type: pick the source type as we will be outputting columns from our stored procedure. Step 4: Double click the Script Component to open the editor. Step 5: In the "Inputs and Outputs" settings, enter all the columns you want to output to the data flow. Ensure the correct data type has been set for each output. You can check the data type by selecting the output and then changing the "DataType" property from the property editor. In our example, we'll add the column JobID of type String. Step 6: Select the "Script" option in the left-hand pane and click the "Edit Script" button. This will open a new Visual Studio window with some boiler plate code in it. Step 7: In the CreateOutputRows() function you can add code that executes the stored procedures included with the Salesforce Component. In this example we will be using the CreateJob and CreateBatch stored procedures. You can find a list of the available stored procedures along with their inputs and outputs in the product help. //Configure the connection string to your credentials String connectionString = "Offline=False;user=myusername;password=mypassword;access token=mytoken;"; using (SalesforceConnection conn = new SalesforceConnection(connectionString)) { //Create the command to call the stored procedure CreateJob SalesforceCommand cmd = new SalesforceCommand("CreateJob", conn); cmd.CommandType = CommandType.StoredProcedure; cmd.Parameters.Add(new SalesforceParameter("ObjectName", "Contact")); cmd.Parameters.Add(new SalesforceParameter("Action", "insert")); //Execute CreateJob //CreateBatch requires JobID as input so we store this value for later SalesforceDataReader rdr = cmd.ExecuteReader(); String JobID = ""; while (rdr.Read()) { JobID = (String)rdr["JobID"]; } //Create the command for CreateBatch, for this example we are adding two new rows SalesforceCommand batCmd = new SalesforceCommand("CreateBatch", conn); batCmd.CommandType = CommandType.StoredProcedure; batCmd.Parameters.Add(new SalesforceParameter("JobID", JobID)); batCmd.Parameters.Add(new SalesforceParameter("Aggregate", "<Contact><Row><FirstName>Bill</FirstName>" + "<LastName>White</LastName></Row><Row><FirstName>Bob</FirstName><LastName>Black</LastName></Row></Contact>")); //Execute CreateBatch SalesforceDataReader batRdr = batCmd.ExecuteReader(); } Step 7b: If you had specified output columns earlier, you can now add data into them using the UserComponent Output0Buffer. For example, we had set an output column called JobID of type String so now we can set a value for it. We will modify the DataReader that contains the output of CreateJob like so:. while (rdr.Read()) { Output0Buffer.AddRow(); JobID = (String)rdr["JobID"]; Output0Buffer.JobID = JobID; } Step 8: Note: You will need to modify the connection string to include your credentials. Also ensure that the System.Data.RSSBus.Salesforce assembly is referenced and include the following using statements to the top of the class: using System.Data; using System.Data.RSSBus.Salesforce; Step 9: Once you are done editing your script, save it, and close the window. Click OK in the Script Transformation window to go back to the main pane. Step 10: If had any outputs from the Script Component you can use them in your data flow. For example we will use a Flat File Destination. Configure the Flat File Destination to output the results to a file, and you should see the JobId in the file. Step 11: Your project should be ready to run.

    Read the article

  • Real Time BI in the Real World

    - by tobin.gilman(at)oracle.com
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman","serif";} Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman","serif"; mso-fareast-font-family:"Times New Roman";} One of my favorite BI offerings from Oracle is a solution called Oracle Real Time Decisions.  Whenever I mention this product in customer meetings, eyes light up.  There are some fascinating examples of customers using it to up-sell, cross-sell, increase customer retention, and reduce risk in real time, with off the charts return on investment. I plan to share some of those stories in a future blog.  In this post however, I want to share some far more common real time analytics use case scenarios that are being addressed with widely deployed Oracle BI and data integration technologies Not all real time BI applications require continuous learning, predictive modeling, and data mining.  Many simply require the ability to integrate, aggregate, and access information that is current (typically within in few minutes or a few seconds).  The use cases are infinite.  A few I've seen: ·         Purchasing agents need to match demand against available inventory ·         Manufacturing planners need to monitor current parts and material against scheduled build plans ·         Airline agents need to match ticket demand against flight schedules, ·         Human resources managers need to track the status of global hiring requisitions against current headcount authorizations...you get the idea. One way of doing this is to run reports or federated queries directly against transactional systems.  That approach can be viable if you only need to access simple data sets on rare occasions.  High volume and complex queries can quickly bog down performance of mission critical transactional systems.  There is an architecturally simple way of solving the problem, and it's being applied by real companies around the world to solve real needs in real time.    Cbeyond is an Atlanta, GA based  provider of voice, data and mobile business applications delivers.  They deliver real time information to its call center agents  as they are interacting with their customers. The data they need resides in production CRM and other transactional systems, but  instead or reporting directly off the those systems, data is first moved to an operational data store (ODS).  Rather than running data intensive, time consuming, and performance degrading batch ETL routines to populate the ODS, Cbeyond uses Oracle Golden Gate software to incrementally capture and move only the changed records from log files of the transactional systems every few minutes.  There is no impact on transactional system performance, and the information needed by call center representatives is up to date.  Oracle Business Intelligence software presents the information to services reps in a rich, visual, and highly interactive format. Avea is similar to Cbeyond.  They are a telecommunications company who integrates billing and customer information in an ODS that is accessed by their call center agents in real time using Oracle Golden Gate and Oracle Business Intelligence.  They've taken it a step further by using the ODS to feed a data warehouse.  The operational data store provides the current information needed by call center agents during "in flight" customer interactions.  The data warehouse is used for more sophisticated analysis of historical data.  For maximum performance, both the ODS and data warehouse run on the Oracle Exadata Database Machine. These are practical illustrations of companies addressing real time reporting and analysis needs using established business intelligence/data warehousing methodologies and tools common to many IT departments.  If real time BI could benefit your organization, you may be already be closer than you thought to having the pieces in place to solving the problem.    Give us a shout if you are interested in learning more or if you have an interesting use or approach to real-time BI.

    Read the article

  • ROracle support for TimesTen In-Memory Database

    - by Sherry LaMonica
    Today's guest post comes from Jason Feldhaus, a Consulting Member of Technical Staff in the TimesTen Database organization at Oracle.  He shares with us a sample session using ROracle with the TimesTen In-Memory database.  Beginning in version 1.1-4, ROracle includes support for the Oracle Times Ten In-Memory Database, version 11.2.2. TimesTen is a relational database providing very fast and high throughput through its memory-centric architecture.  TimesTen is designed for low latency, high-volume data, and event and transaction management. A TimesTen database resides entirely in memory, so no disk I/O is required for transactions and query operations. TimesTen is used in applications requiring very fast and predictable response time, such as real-time financial services trading applications and large web applications. TimesTen can be used as the database of record or as a relational cache database to Oracle Database. ROracle provides an interface between R and the database, providing the rich functionality of the R statistical programming environment using the SQL query language. ROracle uses the OCI libraries to handle database connections, providing much better performance than standard ODBC.The latest ROracle enhancements include: Support for Oracle TimesTen In-Memory Database Support for Date-Time using R's POSIXct/POSIXlt data types RAW, BLOB and BFILE data type support Option to specify number of rows per fetch operation Option to prefetch LOB data Break support using Ctrl-C Statement caching support Times Ten 11.2.2 contains enhanced support for analytics workloads and complex queries: Analytic functions: AVG, SUM, COUNT, MAX, MIN, DENSE_RANK, RANK, ROW_NUMBER, FIRST_VALUE and LAST_VALUE Analytic clauses: OVER PARTITION BY and OVER ORDER BY Multidimensional grouping operators: Grouping clauses: GROUP BY CUBE, GROUP BY ROLLUP, GROUP BY GROUPING SETS Grouping functions: GROUP, GROUPING_ID, GROUP_ID WITH clause, which allows repeated references to a named subquery block Aggregate expressions over DISTINCT expressions General expressions that return a character string in the source or a pattern within the LIKE predicate Ability to order nulls first or last in a sort result (NULLS FIRST or NULLS LAST in the ORDER BY clause) Note: Some functionality is only available with Oracle Exalytics, refer to the TimesTen product licensing document for details. Connecting to TimesTen is easy with ROracle. Simply install and load the ROracle package and load the driver. > install.packages("ROracle") > library(ROracle) Loading required package: DBI > drv <- dbDriver("Oracle") Once the ROracle package is installed, create a database connection object and connect to a TimesTen direct driver DSN as the OS user. > conn <- dbConnect(drv, username ="", password="", dbname = "localhost/SampleDb_1122:timesten_direct") You have the option to report the server type - Oracle or TimesTen? > print (paste ("Server type =", dbGetInfo (conn)$serverType)) [1] "Server type = TimesTen IMDB" To create tables in the database using R data frame objects, use the function dbWriteTable. In the following example we write the built-in iris data frame to TimesTen. The iris data set is a small example data set containing 150 rows and 5 columns. We include it here not to highlight performance, but so users can easily run this example in their R session. > dbWriteTable (conn, "IRIS", iris, overwrite=TRUE, ora.number=FALSE) [1] TRUE Verify that the newly created IRIS table is available in the database. To list the available tables and table columns in the database, use dbListTables and dbListFields, respectively. > dbListTables (conn) [1] "IRIS" > dbListFields (conn, "IRIS") [1] "SEPAL.LENGTH" "SEPAL.WIDTH" "PETAL.LENGTH" "PETAL.WIDTH" "SPECIES" To retrieve a summary of the data from the database we need to save the results to a local object. The following call saves the results of the query as a local R object, iris.summary. The ROracle function dbGetQuery is used to execute an arbitrary SQL statement against the database. When connected to TimesTen, the SQL statement is processed completely within main memory for the fastest response time. > iris.summary <- dbGetQuery(conn, 'SELECT SPECIES, AVG ("SEPAL.LENGTH") AS AVG_SLENGTH, AVG ("SEPAL.WIDTH") AS AVG_SWIDTH, AVG ("PETAL.LENGTH") AS AVG_PLENGTH, AVG ("PETAL.WIDTH") AS AVG_PWIDTH FROM IRIS GROUP BY ROLLUP (SPECIES)') > iris.summary SPECIES AVG_SLENGTH AVG_SWIDTH AVG_PLENGTH AVG_PWIDTH 1 setosa 5.006000 3.428000 1.462 0.246000 2 versicolor 5.936000 2.770000 4.260 1.326000 3 virginica 6.588000 2.974000 5.552 2.026000 4 <NA> 5.843333 3.057333 3.758 1.199333 Finally, disconnect from the TimesTen Database. > dbCommit (conn) [1] TRUE > dbDisconnect (conn) [1] TRUE We encourage you download Oracle software for evaluation from the Oracle Technology Network. See these links for our software: Times Ten In-Memory Database,  ROracle.  As always, we welcome comments and questions on the TimesTen and  Oracle R technical forums.

    Read the article

  • Impressions and Reactions from Alliance 2012

    - by user739873
    Alliance 2012 has come to a conclusion.  What strikes me about every Alliance conference is the amazing amount of collaboration and cooperation I see across higher education in the sharing of best practices around the entire Oracle PeopleSoft software suite, not just the student information system (Oracle’s PeopleSoft Campus Solutions).  In addition to the vibrant U.S. organization, it's gratifying to see the growth in the international attendance again this year, with an EMEA HEUG organizing to complement the existing groups in the Netherlands, South Africa, and the U.K.  Their first meeting is planned for London in October, and I suspect they'll be surprised at the amount of interest and attendance. In my discussions with higher education IT and functional leadership at Alliance there were a number of instances where concern was expressed about Oracle's commitment to higher education as an industry, primarily because of a lack of perceived innovation in the applications that Oracle develops for this market. Here I think perception and reality are far apart, and I'd like to explain why I believe this to be true. First let me start with what I think drives this perception. Predominately it's in two areas. The first area is the user interface, both for students and faculty that interact with the system as "customers", and for those employees of the institution (faculty, staff, and sometimes students as well) that use the system in some kind of administrative role. Because the UI hasn't changed all that much from the PeopleSoft days, individuals perceive this as a dead product with little innovation and therefore Oracle isn't investing. The second area is around the integration of the higher education suite of applications (PeopleSoft Campus Solutions) and the rest of the Oracle software assets. Whether grown organically or acquired, there is an impressive array of middleware and other software products that could be leveraged much more significantly by the higher education applications than is currently the case today. This is also perceived as lack of investment. Let me address these two points.  First the UI.  More is being done here than ever before, and the PAG and other groups where this was discussed at Alliance 2012 were more numerous than I've seen in any past meeting. Whether it's Oracle development leveraging web services or some extremely early but very promising work leveraging the recent Endeca acquisition (see some cool examples here) there are a lot of resources aimed at this issue.  There are also some amazing prototypes being developed by our UX (user experience team) that will eventually make their way into the higher education applications realm - they had an impressive setup at Alliance.  Hopefully many of you that attended found this group. If not, the senior leader for that team Jeremy Ashley will be a significant contributor of content to our summer Industry Strategy Council meeting in Washington in June. In the area of integration with other elements of the Oracle stack, this is also an area of focus for the company and my team.  We're making this a priority especially in the areas of identity management and security, leveraging WebCenter more effectively for content, imaging, and mobility, and driving towards the ultimate objective of WebLogic Suite as our platform for SOA, links to learning management systems (SAIP), and content. There is also much work around business intelligence centering on OBI applications. But at the end of the day we get enormous value from the HEUG (higher education user group) and the various subgroups formed as a part of this community that help us align and prioritize our investments, whether it's around better integration with other Oracle products or integration with partner offerings.  It's one of the healthiest, mutually beneficial relationships between customers and an Education IT concern that exists on the globe. And I can't avoid mentioning that this kind of relationship between higher education and the corporate IT community that can truly address the problems of efficiency and effectiveness, institutional excellence (which starts with IT) and student success.  It's not (in my opinion) going to be solved through community source - cost and complexity only increase in that model and in the end higher education doesn't ultimately focus on core competencies: educating, developing, and researching.  While I agree with some of what Michael A. McRobbie wrote in his EDUCAUSE Review article (Information Technology: A View from Both Sides of the President’s Desk), I take strong issue with his assertion that the "the IT marketplace is just the opposite of long-term stability...."  Sure there has been healthy, creative destruction in the past 2-3 decades, but this has had the effect of, in the aggregate, benefiting education with greater efficiency, more innovation and increased stability as larger, more financially secure firms acquire and develop integrated solutions. Cole

    Read the article

  • Application Composer Series: Where and When to use Groovy

    - by Richard Bingham
    This brief post is really intended as more of a reference than an article. The table below highlights two things, firstly where you can add you own custom logic via groovy code (end column), and secondly (middle column) when you might use each particular feature. Obviously this applies only where Application Composer exists, namely Fusion CRM and Oracle Sales Cloud, and is based on current (release 8) functionality. Feature Most Common Use Case Groovy Field Triggers React to run-time data changes. Only fired when the field is changed and upon submit. Y Object Triggers To extend the standard processing logic for an object, based on record creation, updates and deletes. There is a split between these firing events, with some related to UI/ADF actions and others originating in the database. UI Trigger Points: After Create - fires when a new object record is created. Commonly used to set default values for fields. Before Modify - Fires when the end-user tries to modify a field value. Could be used for generic warnings or extra security logic. Before Invalidate - Fires on the parent object when one of its child object records is created, updated, or deleted. For building in relationship logic. Before Remove - Fires when an attempt is made to delete an object record. Can be used to create conditions that prevent deletes. Database Trigger Points: Before Insert in Database - Fires before a new object is inserted into the database. Can be used to ensure a dependent record exists or check for duplicates. After Insert in Database - Fires after a new object is inserted into the database. Could be used to create a complementary record. Before Update in Database -Fires before an existing object is modified in the database. Could be used to check dependent record values. After Update in Database - Fires after an existing object is modified in the database. Could be used to update a complementary record. Before Delete in Database - Fires before an existing object is deleted from the database. Could be used to check dependent record values. After Delete in Database - Fires after an existing object is deleted from the database. Could be used to remove dependent records. After Commit in Database - Fires after the change pending for the current object (insert, update, delete) is made permanent in the current transaction. Could be used when committed data that has passed all validation is required. After Changes Posted to Database - Fires after all changes have been posted to the database, but before they are permanently committed. Could be used to make additional changes that will be saved as part of the current transaction. Y Field Validation Displays a user entered error message based groovy logic validating the field value. The message is shown only when the validation logic returns false, and the logic is triggered only when tabbing out of the field on the user interface. Y Object Validation Commonly used where validation is needed across multiple related fields on the object. Triggered on the submit UI action. Y Object Workflows All Object Workflows are fired upon either record creation or update, along with the option of adding a custom groovy firing condition. Y Field Updates - change another field when a specified one changes. Intended as an easy way to set different run-time values (e.g. pick values for LOV's) plus the value field permits groovy logic entry. Y E-Mail Notification - sends an email notification to specified users/roles. Templates support using run-time value tokens and rich text. N Task Creation - for adding standard tasks for use in the worklist functionality. N Outbound Message - will create and send an XML payload of the related object SDO to a specified endpoint. N Business Process Flow - intended for approval using the seeded process, however can also trigger custom BPMN flows. N Global Functions Utility functions that can be called from any groovy code in Application Composer (across applications). Y Object Functions Utility functions that are local to the parent object. Usually triggered from within 'Buttons and Actions' definitions in Application Composer, although can be called from other code for that object (e.g. from a trigger). Y Add Custom Fields When adding custom fields there are a few places you can include groovy logic. Y Default Value - to add logic within setting the default value when new records are entered. Y Conditionally Updateable - to add logic to set the field to read-only or not. Y Conditionally Required - to add logic to set the field to required or not. Y Formula Field - Used to provide a new aggregate field that is entirely based on groovy logic and other field values. Y Simplified UI Layouts - Advanced Expressions Used for creating dynamic layouts for simplified UI pages where fields and regions show/hide based on run-time context values and logic. Also includes support for the depends-on feature as a trigger. Y Related References This Blog: Application Composer Series Extending Sales Guide: Using Groovy Scripts Groovy Scripting Reference Guide

    Read the article

  • ADF Business Components

    - by Arda Eralp
    ADF Business Components and JDeveloper simplify the development, delivery, and customization of business applications for the Java EE platform. With ADF Business Components, developers aren't required to write the application infrastructure code required by the typical Java EE application to: Connect to the database Retrieve data Lock database records Manage transactions   ADF Business Components addresses these tasks through its library of reusable software components and through the supporting design time facilities in JDeveloper. Most importantly, developers save time using ADF Business Components since the JDeveloper design time makes typical development tasks entirely declarative. In particular, JDeveloper supports declarative development with ADF Business Components to: Author and test business logic in components which automatically integrate with databases Reuse business logic through multiple SQL-based views of data, supporting different application tasks Access and update the views from browser, desktop, mobile, and web service clients Customize application functionality in layers without requiring modification of the delivered application The goal of ADF Business Components is to make the business services developer more productive.   ADF Business Components provides a foundation of Java classes that allow your business-tier application components to leverage the functionality provided in the following areas: Simplifying Data Access Design a data model for client displays, including only necessary data Include master-detail hierarchies of any complexity as part of the data model Implement end-user Query-by-Example data filtering without code Automatically coordinate data model changes with business services layer Automatically validate and save any changes to the database   Enforcing Business Domain Validation and Business Logic Declaratively enforce required fields, primary key uniqueness, data precision-scale, and foreign key references Easily capture and enforce both simple and complex business rules, programmatically or declaratively, with multilevel validation support Navigate relationships between business domain objects and enforce constraints related to compound components   Supporting Sophisticated UIs with Multipage Units of Work Automatically reflect changes made by business service application logic in the user interface Retrieve reference information from related tables, and automatically maintain the information when the user changes foreign-key values Simplify multistep web-based business transactions with automatic web-tier state management Handle images, video, sound, and documents without having to use code Synchronize pending data changes across multiple views of data Consistently apply prompts, tooltips, format masks, and error messages in any application Define custom metadata for any business components to support metadata-driven user interface or application functionality Add dynamic attributes at runtime to simplify per-row state management   Implementing High-Performance Service-Oriented Architecture Support highly functional web service interfaces for business integration without writing code Enforce best-practice interface-based programming style Simplify application security with automatic JAAS integration and audit maintenance "Write once, run anywhere": use the same business service as plain Java class, EJB session bean, or web service   Streamlining Application Customization Extend component functionality after delivery without modifying source code Globally substitute delivered components with extended ones without modifying the application   ADF Business Components implements the business service through the following set of cooperating components: Entity object An entity object represents a row in a database table and simplifies modifying its data by handling all data manipulation language (DML) operations for you. These are basically your 1 to 1 representation of a database table. Each table in the database will have 1 and only 1 EO. The EO contains the mapping between columns and attributes. EO's also contain the business logic and validation. These are you core data services. They are responsible for updating, inserting and deleting records. The Attributes tab displays the actual mapping between attributes and columns, the mapping has following fields: Name : contains the name of the attribute we expose in our data model. Type : defines the data type of the attribute in our application. Column : specifies the column to which we want to map the attribute with Column Type : contains the type of the column in the database   View object A view object represents a SQL query. You use the full power of the familiar SQL language to join, filter, sort, and aggregate data into exactly the shape required by the end-user task. The attributes in the View Objects are actually coming from the Entity Object. In the end the VO will generate a query but you basically build a VO by selecting which EO need to participate in the VO and which attributes of those EO you want to use. That's why you have the Entity Usage column so you can see the relation between VO and EO. In the query tab you can clearly see the query that will be generated for the VO. At this stage we don't need it and just use it for information purpose. In later stages we might use it. Application module An application module is the controller of your data layer. It is responsible for keeping hold of the transaction. It exposes the data model to the view layer. You expose the VO's through the Application Module. This is the abstraction of your data layer which you want to show to the outside word.It defines an updatable data model and top-level procedures and functions (called service methods) related to a logical unit of work related to an end-user task. While the base components handle all the common cases through built-in behavior, customization is always possible and the default behavior provided by the base components can be easily overridden or augmented. When you create EO's, a foreign key will be translated into an association in our model. It defines the type of relation and who is the master and child as well as how the visibility of the association looks like. A similar concept exists to identify relations between view objects. These are called view links. These are almost identical as association except that a view link is based upon attributes defined in the view object. It can also be based upon an association. Here's a short summary: Entity Objects: representations of tables Association: Relations between EO's. Representations of foreign keys View Objects: Logical model View Links: Relationships between view objects Application Model: interface to your application  

    Read the article

  • Using Subjects to Deploy Queries Dynamically

    - by Roman Schindlauer
    In the previous blog posting, we showed how to construct and deploy query fragments to a StreamInsight server, and how to re-use them later. In today’s posting we’ll integrate this pattern into a method of dynamically composing a new query with an existing one. The construct that enables this scenario in StreamInsight V2.1 is a Subject. A Subject lets me create a junction element in an existing query that I can tap into while the query is running. To set this up as an end-to-end example, let’s first define a stream simulator as our data source: var generator = myApp.DefineObservable(     (TimeSpan t) => Observable.Interval(t).Select(_ => new SourcePayload())); This ‘generator’ produces a new instance of SourcePayload with a period of t (system time) as an IObservable. SourcePayload happens to have a property of type double as its payload data. Let’s also define a sink for our example—an IObserver of double values that writes to the console: var console = myApp.DefineObserver(     (string label) => Observer.Create<double>(e => Console.WriteLine("{0}: {1}", label, e)))     .Deploy("ConsoleSink"); The observer takes a string as parameter which is used as a label on the console, so that we can distinguish the output of different sink instances. Note that we also deploy this observer, so that we can retrieve it later from the server from a different process. Remember how we defined the aggregation as an IQStreamable function in the previous article? We will use that as well: var avg = myApp     .DefineStreamable((IQStreamable<SourcePayload> s, TimeSpan w) =>         from win in s.TumblingWindow(w)         select win.Avg(e => e.Value))     .Deploy("AverageQuery"); Then we define the Subject, which acts as an observable sequence as well as an observer. Thus, we can feed a single source into the Subject and have multiple consumers—that can come and go at runtime—on the other side: var subject = myApp.CreateSubject("Subject", () => new Subject<SourcePayload>()); Subject are always deployed automatically. Their name is used to retrieve them from a (potentially) different process (see below). Note that the Subject as we defined it here doesn’t know anything about temporal streams. It is merely a sequence of SourcePayloads, without any notion of StreamInsight point events or CTIs. So in order to compose a temporal query on top of the Subject, we need to 'promote' the sequence of SourcePayloads into an IQStreamable of point events, including CTIs: var stream = subject.ToPointStreamable(     e => PointEvent.CreateInsert<SourcePayload>(e.Timestamp, e),     AdvanceTimeSettings.StrictlyIncreasingStartTime); In a later posting we will show how to use Subjects that have more awareness of time and can be used as a junction between QStreamables instead of IQbservables. Having turned the Subject into a temporal stream, we can now define the aggregate on this stream. We will use the IQStreamable entity avg that we defined above: var longAverages = avg(stream, TimeSpan.FromSeconds(5)); In order to run the query, we need to bind it to a sink, and bind the subject to the source: var standardQuery = longAverages     .Bind(console("5sec average"))     .With(generator(TimeSpan.FromMilliseconds(300)).Bind(subject)); Lastly, we start the process: standardQuery.Run("StandardProcess"); Now we have a simple query running end-to-end, producing results. What follows next is the crucial part of tapping into the Subject and adding another query that runs in parallel, using the same query definition (the “AverageQuery”) but with a different window length. We are assuming that we connected to the same StreamInsight server from a different process or even client, and thus have to retrieve the previously deployed entities through their names: // simulate the addition of a 'fast' query from a separate server connection, // by retrieving the aggregation query fragment // (instead of simply using the 'avg' object) var averageQuery = myApp     .GetStreamable<IQStreamable<SourcePayload>, TimeSpan, double>("AverageQuery"); // retrieve the input sequence as a subject var inputSequence = myApp     .GetSubject<SourcePayload, SourcePayload>("Subject"); // retrieve the registered sink var sink = myApp.GetObserver<string, double>("ConsoleSink"); // turn the sequence into a temporal stream var stream2 = inputSequence.ToPointStreamable(     e => PointEvent.CreateInsert<SourcePayload>(e.Timestamp, e),     AdvanceTimeSettings.StrictlyIncreasingStartTime); // apply the query, now with a different window length var shortAverages = averageQuery(stream2, TimeSpan.FromSeconds(1)); // bind new sink to query and run it var fastQuery = shortAverages     .Bind(sink("1sec average"))     .Run("FastProcess"); The attached solution demonstrates the sample end-to-end. Regards, The StreamInsight Team

    Read the article

  • Loading, listing, and using R Modules and Functions in PL/R

    - by Dave Jarvis
    I am having difficulty with: Listing the R packages and functions available to PostgreSQL. Installing a package (such as Kendall) for use with PL/R Calling an R function within PostgreSQL Listing Available R Packages Q.1. How do you find out what R modules have been loaded? SELECT * FROM r_typenames(); That shows the types that are available, but what about checking if Kendall( X, Y ) is loaded? For example, the documentation shows: CREATE TABLE plr_modules ( modseq int4, modsrc text ); That seems to allow inserting records to dictate that Kendall is to be loaded, but the following code doesn't explain, syntactically, how to ensure that it gets loaded: INSERT INTO plr_modules VALUES (0, 'pg.test.module.load <-function(msg) {print(msg)}'); Q.2. What would the above line look like if you were trying to load Kendall? Q.3. Is it applicable? Installing R Packages Using the "synaptic" package manager the following packages have been installed: r-base r-base-core r-base-dev r-base-html r-base-latex r-cran-acepack r-cran-boot r-cran-car r-cran-chron r-cran-cluster r-cran-codetools r-cran-design r-cran-foreign r-cran-hmisc r-cran-kernsmooth r-cran-lattice r-cran-matrix r-cran-mgcv r-cran-nlme r-cran-quadprog r-cran-robustbase r-cran-rpart r-cran-survival r-cran-vr r-recommended Q.4. How do I know if Kendall is in there? Q.5. If it isn't, how do I find out what package it is in? Q.6. If it isn't in a package suitable for installing with apt-get (aptitude, synaptic, dpkg, what have you), how do I go about installing it on Ubuntu? Q.7. Where are the installation steps documented? Calling R Functions I have the following code: EXECUTE 'SELECT ' 'regr_slope( amount, year_taken ),' 'regr_intercept( amount, year_taken ),' 'corr( amount, year_taken ),' 'sum( measurements ) AS total_measurements ' 'FROM temp_regression' INTO STRICT slope, intercept, correlation, total_measurements; This code calls the PostgreSQL function corr to calculate Pearson's correlation over the data. Ideally, I'd like to do the following (by switching corr for plr_kendall): EXECUTE 'SELECT ' 'regr_slope( amount, year_taken ),' 'regr_intercept( amount, year_taken ),' 'plr_kendall( amount, year_taken ),' 'sum( measurements ) AS total_measurements ' 'FROM temp_regression' INTO STRICT slope, intercept, correlation, total_measurements; Q.8. Do I have to write plr_kendall myself? Q.9. Where can I find a simple example that walks through: Loading an R module into PG. Writing a PG wrapper for the desired R function. Calling the PG wrapper from a SELECT. For example, would the last two steps look like: create or replace function plr_kendall( _float8, _float8 ) returns float as ' agg_kendall(arg1, arg2) ' language 'plr'; CREATE AGGREGATE agg_kendall ( sfunc = plr_array_accum, basetype = float8, -- ??? stype = _float8, -- ??? finalfunc = plr_kendall ); And then the SELECT as above? Thank you!

    Read the article

  • SQL CLR Stored Procedure and Web Service

    - by Nathan
    I am current working on a task in which I am needing to call a method in a web service from a CLR stored procedure. A bit of background: Basically, I have a task that requires ALOT of crunching. If done strictly in SQL, it takes somewhere around 30-45 mins to process. If I pull the same process into code, I can get it complete in seconds due to being able to optimize the processing so much more efficiently. The only problem is that I have to have this process set as an automated task in SQL Server. In that vein, I have exposed the process as a web service (I use it for other things as well) and want the SQL CLR sproc to consume the service and execute the code. This allows me to have my automated task. The problem: I have read quite a few different topics regarding how to consume a web service in a CLR Sproc and have done so effectivly. Here is an example of what I have followed. http://blog.hoegaerden.be/2008/11/11/calling-a-web-service-from-sql-server-2005/ I can get this example working without any issues. However, whenever I pair this process w/ a Web Service method that involves a database call, I get the following exceptions (depending upon whether or not I wrap in a try / catch): Msg 10312, Level 16, State 49, Procedure usp_CLRRunDirectSimulationAndWriteResults, Line 0 .NET Framework execution was aborted. The UDP/UDF/UDT did not revert thread token. or Msg 6522, Level 16, State 1, Procedure MyStoredProc , Line 0 A .NET Framework error occurred during execution of user defined routine or aggregate 'MyStoredProc': System.Security.SecurityException: Request for the permission of type 'System.Security.Permissions.EnvironmentPermission, mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' failed. System.Security.SecurityException: at System.Security.CodeAccessSecurityEngine.Check(Object demand, StackCrawlMark& stackMark, Boolean isPermSet) at System.Security.CodeAccessPermission.Demand() at System.Net.CredentialCache.get_DefaultCredentials() at System.Web.Services.Protocols.WebClientProtocol.set_UseDefaultCredentials(Boolean value) at MyStoredProc.localhost.MPWebService.set_UseDefaultCredentials(Boolean Value) at MyStoredProclocalhost.MPWebService..ctor() at MyStoredProc.StoredProcedures.MyStoredProc(String FromPostCode, String ToPostCode) I am sure this is a permission issue, but I can't, for the life of me get it working. I have attempted using impersonation in the CLR sproc and a few other things. Any suggestions? What am I missing?

    Read the article

  • Data aggregation mongodb vs mysql

    - by Dimitris Stefanidis
    I am currently researching on a backend to use for a project with demanding data aggregation requirements. The main project requirements are the following. Store millions of records for each user. Users might have more than 1 million entries per year so even with 100 users we are talking about 100 million entries per year. Data aggregation on those entries must be performed on the fly. The users need to be able to filter on the entries by a ton of available filters and then present summaries (totals , averages e.t.c) and graphs on the results. Obviously I cannot precalculate any of the aggregation results because the filter combinations (and thus the result sets) are huge. Users are going to have access on their own data only but it would be nice if anonymous stats could be calculated for all the data. The data is going to be most of the time in batch. e.g the user will upload the data every day and it could like 3000 records. In some later version there could be automated programs that upload every few minutes in smaller batches of 100 items for example. I made a simple test of creating a table with 1 million rows and performing a simple sum of 1 column both in mongodb and in mysql and the performance difference was huge. I do not remember the exact numbers but it was something like mysql = 200ms , mongodb = 20 sec. I have also made the test with couchdb and had much worse results. What seems promising speed wise is cassandra which I was very enthusiastic about when I first discovered it. However the documentation is scarce and I haven't found any solid examples on how to perform sums and other aggregate functions on the data. Is that possible ? As it seems from my test (Maybe I have done something wrong) with the current performance its impossible to use mongodb for such a project although the automated sharding functionality seems like a perfect fit for it. Does anybody have experience with data aggregation in mongodb or have any insights that might be of help for the implementation of the project ? Thanks, Dimitris

    Read the article

  • Applying Domain Model on top of Linq2Sql entities

    - by Thomas
    I am trying to practice the model first approach and I am putting together a domain model. My requirement is pretty simple: UserSession can have multiple ShoppingCartItems. I should start off by saying that I am going to apply the domain model interfaces to Linq2Sql generated entities (using partial classes). My requirement translates into three database tables (UserSession, Product, ShoppingCartItem where ProductId and UserSessionId are foreign keys in the ShoppingCartItem table). Linq2Sql generates these entities for me. I know I shouldn't even be dealing with the database at this point but I think it is important to mention. The aggregate root is UserSession as a ShoppingCartItem can not exist without a UserSession but I am unclear on the rest. What about Product? It is defiently an entity but should it be associated to ShoppingCartItem? Here are a few suggestion (they might all be incorrect implementations): public interface IUserSession { public Guid Id { get; set; } public IList<IShoppingCartItem> ShoppingCartItems{ get; set; } } public interface IShoppingCartItem { public Guid UserSessionId { get; set; } public int ProductId { get; set; } } Another one would be: public interface IUserSession { public Guid Id { get; set; } public IList<IShoppingCartItem> ShoppingCartItems{ get; set; } } public interface IShoppingCartItem { public Guid UserSessionId { get; set; } public IProduct Product { get; set; } } A third one is: public interface IUserSession { public Guid Id { get; set; } public IList<IShoppingCartItemColletion> ShoppingCartItems{ get; set; } } public interface IShoppingCartItemColletion { public IUserSession UserSession { get; set; } public IProduct Product { get; set; } } public interface IProduct { public int ProductId { get; set; } } I have a feeling my mind is too tightly coupled with database models and tables which is making this hard to grasp. Anyone care to decouple?

    Read the article

  • How is IObservable<double>.Average supposed to work?

    - by Dan Tao
    Update Looks like Jon Skeet was right (big surprise!) and the issue was with my assumption about the Average extension providing a continuous average (it doesn't). For the behavior I'm after, I wrote a simple ContinuousAverage extension method, the implementation of which I am including here for the benefit of others who may want something similar: public static class ObservableExtensions { private class ContinuousAverager { private double _mean; private long _count; public ContinuousAverager() { _mean = 0.0; _count = 0L; } // undecided whether this method needs to be made thread-safe or not // seems that ought to be the responsibility of the IObservable (?) public double Add(double value) { double delta = value - _mean; _mean += (delta / (double)(++_count)); return _mean; } } public static IObservable<double> ContinousAverage(this IObservable<double> source) { var averager = new ContinuousAverager(); return source.Select(x => averager.Add(x)); } } I'm thinking of going ahead and doing something like the above for the other obvious candidates as well -- so, ContinuousCount, ContinuousSum, ContinuousMin, ContinuousMax ... perhaps ContinuousVariance and ContinuousStandardDeviation as well? Any thoughts on that? Original Question I use Rx Extensions a little bit here and there, and feel I've got the basic ideas down. Now here's something odd: I was under the impression that if I wrote this: var ticks = Observable.FromEvent<QuoteEventArgs>(MarketDataProvider, "MarketTick"); var bids = ticks .Where(e => e.EventArgs.Quote.HasBid) .Select(e => e.EventArgs.Quote.Bid); var bidsSubscription = bids.Subscribe( b => Console.WriteLine("Bid: {0}", b) ); var avgOfBids = bids.Average(); var avgOfBidsSubscription = avgOfBids.Subscribe( b => Console.WriteLine("Avg Bid: {0}", b) ); I would get two IObservable<double> objects (bids and avgOfBids); one would basically be a stream of all the market bids from my MarketDataProvider, the other would be a stream of the average of these bids. So something like this: Bid Avg Bid 1 1 2 1.5 1 1.33 2 1.5 It seems that my avgOfBids object isn't doing anything. What am I missing? I think I've probably misunderstood what Average is actually supposed to do. (This also seems to be the case for all of the aggregate-like extension methods on IObservable<T> -- e.g., Max, Count, etc.)

    Read the article

  • Any simple approaches for managing customer data change requests for global reference files?

    - by Kelly Duke
    For the first time, I am developing in an environment in which there is a central repository for a number of different industry standard reference data tables and many different customers who need to select records from these industry standard reference data tables to fill in foreign key information for their customer specific records. Because these industry standard reference files are utilized by all customers, I want to reserve Create/Update/Delete access to these records for global product administrators. However, I would like to implement a (semi-)automated interface by which specific customers could request record additions, deletions or modifications to any of the industry standard reference files that are shared among all customers. I know I need something like a "data change request" table specifying: user id, user request datetime, request type (insert, modify, delete), a user entered text explanation of the change request, the user request's current status (pending, declined, completed), admin resolution datetime, admin id, an admin entered text description of the resolution, etc. What I can't figure out is how to elegantly handle the fact that these data change requests could apply to dozens of different tables with differing table column definitions. I would like to give the customer users making these data change requests a convenient way to enter their proposed record additions/modifications directly into CRUD screens that look very much like the reference table CRUD screens they don't have write/delete permissions for (with an additional text explanation and perhaps request priority field). I would also like to give the global admins a tool that allows them to view all the outstanding data change requests for the users they oversee sorted by date requested or user/date requested. Upon selecting a data change request record off the list, the admin would be directed to another CRUD screen that would be populated with the fields the customer users requested for the new/modified industry standard reference table record along with customer's text explanation, the request status and the text resolution explanation field. At this point the admin could accept/edit/reject the requested change and if accepted the affected industry standard reference file would be automatically updated with the appropriate fields and the data change request record's status, text resolution explanation and resolution datetime would all also be appropriately updated. However, I want to keep the actual production reference tables as simple as possible and free from these extraneous and typically null customer change request fields. I'd also like the data change request file to aggregate all data change requests across all the reference tables yet somehow "point to" the specific reference table and primary key in question for modification & deletion requests or the specific reference table and associated customer user entered field values in question for record creation requests. Does anybody have any ideas of how to design something like this effectively? Is there a cleaner, simpler way I am missing? Thank you so much for reading.

    Read the article

  • SQL Select - adding field to Select is changing the results

    - by nycdan
    I'm stumped by this SQL problem that I suspect will be easy pickings for someone out there. I have a table that contains rows representing several daily lists of ranked items. The relevent fields are as follows: ID, ListID, ItemID, ItemName, ItemRank, Date. I have a query that returns the items that were on a list yesterday but not today (Items Off List) as follows: Select ItemID, ListID, ItemName, convert(varchar(10),MAX(date),101) as date, COUNT(ItemName) as days_on_list From Table Group By ItemID, ListID, ItemName Having Max(date) = DATEADD("d",-1,convert(varchar(10),getdate(),101)) and ListID = 1 Order By ListID, ItemName, COUNT(ItemName) Basically I'm looking for records where the max date is yesterday. It works fine and shows the number of days each item was previously on the list (although not necessarily consecutively, but that's fine for now). The problem is when I try to add ranking to see what yesterday's rank was. I tried the following: Select ItemID, ListID, ItemName, ranking, convert(varchar(10),MAX(date),101) as date, COUNT(ItemName) as days_on_list From Table Group By ItemID, ListID, ItemName, ranking Having Max(date) = DATEADD("d",-1,convert(varchar(10),getdate(),101)) and ListID = 1 Order By ListID, ItemName, ranking, COUNT(ItemName) This returns a great deal more records than the previous query so something isn't right with it. I want the same number of records, but with the ranking included. I can get the rank by doing a self-join with a subquery and getting records where the ItemID occurs yesterday but not today - but then I don't know how to get the Count any more. Appreciation in advance for any help with this. ======== SOLVED ============== Select ItemID, ListID, ItemName, ranking, convert(varchar(10),MAX(date),101) as date, COUNT(ItemName) as days_on_list from Table T Where date = DATEADD("d",-1,convert(varchar(10),getdate(),101)) and ListID = 1 and T.ItemID Not In (select T.ItemID from Table T join Table T2 on T.ItemID = T2.ItemID and T.ListID = T2.ListID where T.date = DATEADD("d",-1,convert(varchar(10),getdate(),101)) and T2.date = convert (varchar(10),getdate(),101) and T.ListID = 1) Group by ItemID, ListID, ItemName, ranking Basically, what I did was create a subquery that finds all items that appear in both days, and finds items that appeared yesterday but are not in the set of items that appeared both days. Then I was able to do the aggregate function and grouping correctly. I would NOT be surprised if this is more convoluted than necessary but I understand it and can modify it as needed and performance doesn't seem to be an issue. Thanks everyone for the assist.

    Read the article

  • PostgreSQL - fetch the row which has the Max value for a column

    - by Joshua Berry
    I'm dealing with a Postgres table (called "lives") that contains records with columns for time_stamp, usr_id, transaction_id, and lives_remaining. I need a query that will give me the most recent lives_remaining total for each usr_id There are multiple users (distinct usr_id's) time_stamp is not a unique identifier: sometimes user events (one by row in the table) will occur with the same time_stamp. trans_id is unique only for very small time ranges: over time it repeats remaining_lives (for a given user) can both increase and decrease over time example: time_stamp|lives_remaining|usr_id|trans_id ----------------------------------------- 07:00 | 1 | 1 | 1 09:00 | 4 | 2 | 2 10:00 | 2 | 3 | 3 10:00 | 1 | 2 | 4 11:00 | 4 | 1 | 5 11:00 | 3 | 1 | 6 13:00 | 3 | 3 | 1 As I will need to access other columns of the row with the latest data for each given usr_id, I need a query that gives a result like this: time_stamp|lives_remaining|usr_id|trans_id ----------------------------------------- 11:00 | 3 | 1 | 6 10:00 | 1 | 2 | 4 13:00 | 3 | 3 | 1 As mentioned, each usr_id can gain or lose lives, and sometimes these timestamped events occur so close together that they have the same timestamp! Therefore this query won't work: SELECT b.time_stamp,b.lives_remaining,b.usr_id,b.trans_id FROM (SELECT usr_id, max(time_stamp) AS max_timestamp FROM lives GROUP BY usr_id ORDER BY usr_id) a JOIN lives b ON a.max_timestamp = b.time_stamp Instead, I need to use both time_stamp (first) and trans_id (second) to identify the correct row. I also then need to pass that information from the subquery to the main query that will provide the data for the other columns of the appropriate rows. This is the hacked up query that I've gotten to work: SELECT b.time_stamp,b.lives_remaining,b.usr_id,b.trans_id FROM (SELECT usr_id, max(time_stamp || '*' || trans_id) AS max_timestamp_transid FROM lives GROUP BY usr_id ORDER BY usr_id) a JOIN lives b ON a.max_timestamp_transid = b.time_stamp || '*' || b.trans_id ORDER BY b.usr_id Okay, so this works, but I don't like it. It requires a query within a query, a self join, and it seems to me that it could be much simpler by grabbing the row that MAX found to have the largest timestamp and trans_id. The table "lives" has tens of millions of rows to parse, so I'd like this query to be as fast and efficient as possible. I'm new to RDBM and Postgres in particular, so I know that I need to make effective use of the proper indexes. I'm a bit lost on how to optimize. I found a similar discussion here. Can I perform some type of Postgres equivalent to an Oracle analytic function? Any advice on accessing related column information used by an aggregate function (like MAX), creating indexes, and creating better queries would be much appreciated! P.S. You can use the following to create my example case: create TABLE lives (time_stamp timestamp, lives_remaining integer, usr_id integer, trans_id integer); insert into lives values ('2000-01-01 07:00', 1, 1, 1); insert into lives values ('2000-01-01 09:00', 4, 2, 2); insert into lives values ('2000-01-01 10:00', 2, 3, 3); insert into lives values ('2000-01-01 10:00', 1, 2, 4); insert into lives values ('2000-01-01 11:00', 4, 1, 5); insert into lives values ('2000-01-01 11:00', 3, 1, 6); insert into lives values ('2000-01-01 13:00', 3, 3, 1);

    Read the article

  • Generate lags R

    - by Btibert3
    Hi All, I hope this is basic; just need a nudge in the right direction. I have read in a database table from MS Access into a data frame using RODBC. Here is a basic structure of what I read in: PRODID PROD Year Week QTY SALES INVOICE Here is the structure: str(data) 'data.frame': 8270 obs. of 7 variables: $ PRODID : int 20001 20001 20001 100001 100001 100001 100001 100001 100001 100001 ... $ PROD : Factor w/ 1239 levels "1% 20qt Box",..: 335 335 335 128 128 128 128 128 128 128 ... $ Year : int 2010 2010 2010 2009 2009 2009 2009 2009 2009 2010 ... $ Week : int 12 18 19 14 15 16 17 18 19 9 ... $ QTY : num 1 1 0 135 300 270 300 270 315 315 ... $ SALES : num 15.5 0 -13.9 243 540 ... $ INVOICES: num 1 1 2 5 11 11 10 11 11 12 ... Here are the top few rows: head(data, n=10) PRODID PROD Year Week QTY SALES INVOICES 1 20001 Dolie 12" 2010 12 1 15.46 1 2 20001 Dolie 12" 2010 18 1 0.00 1 3 20001 Dolie 12" 2010 19 0 -13.88 2 4 100001 Cage Free Eggs 2009 14 135 243.00 5 5 100001 Cage Free Eggs 2009 15 300 540.00 11 6 100001 Cage Free Eggs 2009 16 270 486.00 11 7 100001 Cage Free Eggs 2009 17 300 540.00 10 8 100001 Cage Free Eggs 2009 18 270 486.00 11 9 100001 Cage Free Eggs 2009 19 315 567.00 11 10 100001 Cage Free Eggs 2010 9 315 569.25 12 I simply want to generate lags for QTY, SALES, INVOICE for each product but I am not sure where to start. I know R is great with Time Series, but I am not sure where to start. I have two questions: 1- I have the raw invoice data but have aggregated it for reporting purposes. Would it be easier if I didn't aggregate the data? 2- Regardless of aggregation or not, what functions will I need to loop over each product and generate the lags as I need them? In short, I want to loop over a set of records, calculate lags for a product (if possible), append the lags (as they apply) to the current record for each product, and write the results back to a table in my database for my reporting software to use. Any help you can provide will be greatly appreciated! Many thanks in advance, Brock

    Read the article

  • Trying to implement a method that can compare any two lists but it always returns false

    - by Tyler Pfaff
    Hello like the title says I'm trying to make a method that can compare any two lists for equality. I'm trying to compare them in a way that validates that every element of one list has the same value as every element of another list. My Equals method below always returns false, can anyone see why that is? Thank you! using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; public class IEnumerableComparer<T> : IEqualityComparer<IEnumerable<T>> { public bool Equals(IEnumerable<T> x, IEnumerable<T> y) { for(int i = 0; i<x.Count();i++){ if(!Object.Equals(x.ElementAt(i), y.ElementAt(i))){ return false; } } return true; } public int GetHashCode(IEnumerable<T> obj) { if (obj == null) return 0; return unchecked(obj.Select(e => e.GetHashCode()).Aggregate(0, (a, b) => a + b)); } } Here is my data I'm using to test this Equals method. static void Main(string[] args) { Car car1 = new Car(); car1.make = "Toyota"; car1.model = "xB"; Car car2 = new Car(); car2.make = "Toyota"; car2.model = "xB"; List<Car> l1 = new List<Car>(); List<Car> l2 = new List<Car>(); l1.Add(car1); l2.Add(car2); IEnumerableComparer<Car> seq = new IEnumerableComparer<Car>(); bool b = seq.Equals(l1, l2); Console.Write(b); //always says false Console.Read(); } } Car class class Car { public String make { get; set; } public String model { get; set; } }

    Read the article

  • When does logic belong in the Business Object/Entity, and when does it belong in a Service?

    - by Casey
    In trying to understand Domain Driven Design I keep returning to a question that I can't seem to definitively answer. How do you determine what logic belongs to a Domain entity, and what logic belongs to a Domain Service? Example: We have an Order class for an online store. This class is an entity and an aggregate root (it contains OrderItems). Public Class Order:IOrder { Private List<IOrderItem> OrderItems Public Order(List<IOrderItem>) { OrderItems = List<IOrderItem> } Public Decimal CalculateTotalItemWeight() //This logic seems to belong in the entity. { Decimal TotalWeight = 0 foreach(IOrderItem OrderItem in OrderItems) { TotalWeight += OrderItem.Weight } return TotalWeight } } I think most people would agree that CalculateTotalItemWeight belongs on the entity. However, at some point we have to ship this order to the customer. To accomplish this we need to do two things: 1) Determine the postage rate necessary to ship this order. 2) Print a shipping label after determining the postage rate. Both of these actions will require dependencies that are outside the Order entity, such as an external webservice to retrieve postage rates. How should we accomplish these two things? I see a few options: 1) Code the logic directly in the domain entity, like CalculateTotalItemWeight. We then call: Order.GetPostageRate Order.PrintLabel 2) Put the logic in a service that accepts IOrder. We then call: PostageService.GetPostageRate(Order) PrintService.PrintLabel(Order) 3) Create a class for each action that operates on an Order, and pass an instance of that class to the Order through Constructor Injection (this is a variation of option 1 but allows reuse of the RateRetriever and LabelPrinter classes): Public Class Order:IOrder { Private List<IOrderItem> OrderItems Private RateRetriever _Retriever Private LabelPrinter _Printer Public Order(List<IOrderItem>, RateRetriever Retriever, LabelPrinter Printer) { OrderItems = List<IOrderItem> _Retriever = Retriever _Printer = Printer } Public Decimal GetPostageRate { _Retriever.GetPostageRate(this) } Public void PrintLabel { _Printer.PrintLabel(this) } } Which one of these methods do you choose for this logic, if any? What is the reasoning behind your choice? Most importantly, is there a set of guidelines that led you to your choice?

    Read the article

  • TFS 2010 Build Custom Activity for Merging Assemblies

    - by Jakob Ehn
    *** The sample build process template discussed in this post is available for download from here: http://cid-ee034c9f620cd58d.office.live.com/self.aspx/BlogSamples/ILMerge.xaml ***   In my previous post I talked about library builds that we use to build and replicate dependencies between applications in TFS. This is typically used for common libraries and tools that several other application need to reference. When the libraries grow in size over time, so does the number of assemblies. So all solutions that uses the common library must reference all the necessary assemblies that they need, and if we for example do a refactoring and extract some code into a new assembly, all the clients must update their references to reflect these changes, otherwise it won’t compile. To improve on this, we use a tool from Microsoft Research called ILMerge (Download from here). It can be used to merge several assemblies into one assembly that contains all types. If you haven’t used this tool before, you should check it out. Previously I have implemented this in builds using a simple batch file that contains the full command, something like this: "%ProgramFiles(x86)%\microsoft\ilmerge\ilmerge.exe" /target:library /attr:ClassLibrary1.bl.dll /out:MyNewLibrary.dll ClassLibrary1.dll ClassLibrar2.dll ClassLibrary3.dll This merges 3 assemblies (ClassLibrary1, 2 and 3) into a new assembly called MyNewLibrary.dll. It will copy the attributes (file version, product version etc..) from ClassLibrary1.dll, using the /attr switch. For more info on ILMerge command line tool, see the above link. This approach works, but requires a little bit too much knowledge for the developers creating builds, therefor I have implemented a custom activity that wraps the use of ILMerge. This makes it much simpler to setup a new build definition and have the build automatically do the merging. The usage of the activity is then implemented as part of the Library Build process template mentioned in the previous post. For this article I have just created a simple build process template that only performs the ILMerge operation.   Below is the code for the custom activity. To make it compile, you need to reference the ILMerge.exe assembly. /// <summary> /// Activity for merging a list of assembies into one, using ILMerge /// </summary> public sealed class ILMergeActivity : BaseCodeActivity { /// <summary> /// A list of file paths to the assemblies that should be merged /// </summary> [RequiredArgument] public InArgument<IEnumerable<string>> InputAssemblies { get; set; } /// <summary> /// Full path to the generated assembly /// </summary> [RequiredArgument] public InArgument<string> OutputFile { get; set; } /// <summary> /// Which input assembly that the attibutes for the generated assembly should be copied from. /// Optional. If not specified, the first input assembly will be used /// </summary> public InArgument<string> AttributeFile { get; set; } /// <summary> /// Kind of assembly to generate, dll or exe /// </summary> public InArgument<TargetKindEnum> TargetKind { get; set; } // If your activity returns a value, derive from CodeActivity<TResult> // and return the value from the Execute method. protected override void Execute(CodeActivityContext context) { string message = InputAssemblies.Get(context).Aggregate("", (current, assembly) => current + (assembly + " ")); TrackMessage(context, "Merging " + message + " into " + OutputFile.Get(context)); ILMerge m = new ILMerge(); m.SetInputAssemblies(InputAssemblies.Get(context).ToArray()); m.TargetKind = TargetKind.Get(context) == TargetKindEnum.Dll ? ILMerge.Kind.Dll : ILMerge.Kind.Exe; m.OutputFile = OutputFile.Get(context); m.AttributeFile = !String.IsNullOrEmpty(AttributeFile.Get(context)) ? AttributeFile.Get(context) : InputAssemblies.Get(context).First(); m.SetTargetPlatform(RuntimeEnvironment.GetSystemVersion().Substring(0,2), RuntimeEnvironment.GetRuntimeDirectory()); m.Merge(); TrackMessage(context, "Generated " + m.OutputFile); } } [Browsable(true)] public enum TargetKindEnum { Dll, Exe } NB: The activity inherits from a BaseCodeActivity class which is an internal helper class which contains some methods and properties useful for moste custom activities. In this case, it uses the TrackeMessage method for writing to the build log. You either need to remove the TrackMessage method calls, or implement this yourself (which is not very hard… ) The custom activity has the following input arguments: InputAssemblies A list with the (full) paths to the assemblies to merge OutputFile The name of the resulting merged assembly AttributeFile Which assembly to use as the template for the attribute of the merged assembly. This argument is optional and if left blank, the first assembly in the input list is used TargetKind Decides what type of assembly to create, can be either a dll or an exe Of course, there are more switches to the ILMerge.exe, and these can be exposed as input arguments as well if you need it. To show how the custom activity can be used, I have attached a build process template (see link at the top of this post) that merges the output of the projects being built (CommonLibrary.dll and CommonLibrary2.dll) into a merged assembly (NewLibrary.dll). The build process template has the following custom process parameters:   The Assemblies To Merge argument is passed into a FindMatchingFiles activity to located all assemblies that are located in the BinariesDirectory folder after the compilation has been performed by Team Build. Here is the complete sequence of activities that performs the merge operation. It is located at the end of the Try, Compile, Test and Associate… sequence: It splits the AssembliesToMerge parameter and appends the full path (using the BinariesDirectory variable) and then enumerates the matching files using the FindMatchingFiles activity. When running the build, you can see that it merges two assemblies into a new one:     And the merged assembly (and associated pdb file) is copied to the drop location together with the rest of the assemblies:

    Read the article

  • A Guided Tour of Complexity

    - by JoshReuben
    I just re-read Complexity – A Guided Tour by Melanie Mitchell , protégé of Douglas Hofstadter ( author of “Gödel, Escher, Bach”) http://www.amazon.com/Complexity-Guided-Tour-Melanie-Mitchell/dp/0199798109/ref=sr_1_1?ie=UTF8&qid=1339744329&sr=8-1 here are some notes and links:   Evolved from Cybernetics, General Systems Theory, Synergetics some interesting transdisciplinary fields to investigate: Chaos Theory - http://en.wikipedia.org/wiki/Chaos_theory – small differences in initial conditions (such as those due to rounding errors in numerical computation) yield widely diverging outcomes for chaotic systems, rendering long-term prediction impossible. System Dynamics / Cybernetics - http://en.wikipedia.org/wiki/System_Dynamics – study of how feedback changes system behavior Network Theory - http://en.wikipedia.org/wiki/Network_theory – leverage Graph Theory to analyze symmetric  / asymmetric relations between discrete objects Algebraic Topology - http://en.wikipedia.org/wiki/Algebraic_topology – leverage abstract algebra to analyze topological spaces There are limits to deterministic systems & to computation. Chaos Theory definitely applies to training an ANN (artificial neural network) – different weights will emerge depending upon the random selection of the training set. In recursive Non-Linear systems http://en.wikipedia.org/wiki/Nonlinear_system – output is not directly inferable from input. E.g. a Logistic map: Xt+1 = R Xt(1-Xt) Different types of bifurcations, attractor states and oscillations may occur – e.g. a Lorenz Attractor http://en.wikipedia.org/wiki/Lorenz_system Feigenbaum Constants http://en.wikipedia.org/wiki/Feigenbaum_constants express ratios in a bifurcation diagram for a non-linear map – the convergent limit of R (the rate of period-doubling bifurcations) is 4.6692016 Maxwell’s Demon - http://en.wikipedia.org/wiki/Maxwell%27s_demon - the Second Law of Thermodynamics has only a statistical certainty – the universe (and thus information) tends towards entropy. While any computation can theoretically be done without expending energy, with finite memory, the act of erasing memory is permanent and increases entropy. Life & thought is a counter-example to the universe’s tendency towards entropy. Leo Szilard and later Claude Shannon came up with the Information Theory of Entropy - http://en.wikipedia.org/wiki/Entropy_(information_theory) whereby Shannon entropy quantifies the expected value of a message’s information in bits in order to determine channel capacity and leverage Coding Theory (compression analysis). Ludwig Boltzmann came up with Statistical Mechanics - http://en.wikipedia.org/wiki/Statistical_mechanics – whereby our Newtonian perception of continuous reality is a probabilistic and statistical aggregate of many discrete quantum microstates. This is relevant for Quantum Information Theory http://en.wikipedia.org/wiki/Quantum_information and the Physics of Information - http://en.wikipedia.org/wiki/Physical_information. Hilbert’s Problems http://en.wikipedia.org/wiki/Hilbert's_problems pondered whether mathematics is complete, consistent, and decidable (the Decision Problem – http://en.wikipedia.org/wiki/Entscheidungsproblem – is there always an algorithm that can determine whether a statement is true).  Godel’s Incompleteness Theorems http://en.wikipedia.org/wiki/G%C3%B6del's_incompleteness_theorems  proved that mathematics cannot be both complete and consistent (e.g. “This statement is not provable”). Turing through the use of Turing Machines (http://en.wikipedia.org/wiki/Turing_machine symbol processors that can prove mathematical statements) and Universal Turing Machines (http://en.wikipedia.org/wiki/Universal_Turing_machine Turing Machines that can emulate other any Turing Machine via accepting programs as well as data as input symbols) that computation is limited by demonstrating the Halting Problem http://en.wikipedia.org/wiki/Halting_problem (is is not possible to know when a program will complete – you cannot build an infinite loop detector). You may be used to thinking of 1 / 2 / 3 dimensional systems, but Fractal http://en.wikipedia.org/wiki/Fractal systems are defined by self-similarity & have non-integer Hausdorff Dimensions !!!  http://en.wikipedia.org/wiki/List_of_fractals_by_Hausdorff_dimension – the fractal dimension quantifies the number of copies of a self similar object at each level of detail – eg Koch Snowflake - http://en.wikipedia.org/wiki/Koch_snowflake Definitions of complexity: size, Shannon entropy, Algorithmic Information Content (http://en.wikipedia.org/wiki/Algorithmic_information_theory - size of shortest program that can generate a description of an object) Logical depth (amount of info processed), thermodynamic depth (resources required). Complexity is statistical and fractal. John Von Neumann’s other machine was the Self-Reproducing Automaton http://en.wikipedia.org/wiki/Self-replicating_machine  . Cellular Automata http://en.wikipedia.org/wiki/Cellular_automaton are alternative form of Universal Turing machine to traditional Von Neumann machines where grid cells are locally synchronized with their neighbors according to a rule. Conway’s Game of Life http://en.wikipedia.org/wiki/Conway's_Game_of_Life demonstrates various emergent constructs such as “Glider Guns” and “Spaceships”. Cellular Automatons are not practical because logical ops require a large number of cells – wasteful & inefficient. There are no compilers or general program languages available for Cellular Automatons (as far as I am aware). Random Boolean Networks http://en.wikipedia.org/wiki/Boolean_network are extensions of cellular automata where nodes are connected at random (not to spatial neighbors) and each node has its own rule –> they demonstrate the emergence of complex  & self organized behavior. Stephen Wolfram’s (creator of Mathematica, so give him the benefit of the doubt) New Kind of Science http://en.wikipedia.org/wiki/A_New_Kind_of_Science proposes the universe may be a discrete Finite State Automata http://en.wikipedia.org/wiki/Finite-state_machine whereby reality emerges from simple rules. I am 2/3 through this book. It is feasible that the universe is quantum discrete at the plank scale and that it computes itself – Digital Physics: http://en.wikipedia.org/wiki/Digital_physics – a simulated reality? Anyway, all behavior is supposedly derived from simple algorithmic rules & falls into 4 patterns: uniform , nested / cyclical, random (Rule 30 http://en.wikipedia.org/wiki/Rule_30) & mixed (Rule 110 - http://en.wikipedia.org/wiki/Rule_110 localized structures – it is this that is interesting). interaction between colliding propagating signal inputs is then information processing. Wolfram proposes the Principle of Computational Equivalence - http://mathworld.wolfram.com/PrincipleofComputationalEquivalence.html - all processes that are not obviously simple can be viewed as computations of equivalent sophistication. Meaning in information may emerge from analogy & conceptual slippages – see the CopyCat program: http://cognitrn.psych.indiana.edu/rgoldsto/courses/concepts/copycat.pdf Scale Free Networks http://en.wikipedia.org/wiki/Scale-free_network have a distribution governed by a Power Law (http://en.wikipedia.org/wiki/Power_law - much more common than Normal Distribution). They are characterized by hubs (resilience to random deletion of nodes), heterogeneity of degree values, self similarity, & small world structure. They grow via preferential attachment http://en.wikipedia.org/wiki/Preferential_attachment – tipping points triggered by positive feedback loops. 2 theories of cascading system failures in complex systems are Self-Organized Criticality http://en.wikipedia.org/wiki/Self-organized_criticality and Highly Optimized Tolerance http://en.wikipedia.org/wiki/Highly_optimized_tolerance. Computational Mechanics http://en.wikipedia.org/wiki/Computational_mechanics – use of computational methods to study phenomena governed by the principles of mechanics. This book is a great intuition pump, but does not cover the more mathematical subject of Computational Complexity Theory – http://en.wikipedia.org/wiki/Computational_complexity_theory I am currently reading this book on this subject: http://www.amazon.com/Computational-Complexity-Christos-H-Papadimitriou/dp/0201530821/ref=pd_sim_b_1   stay tuned for that review!

    Read the article

  • SQL Server Split() Function

    - by HighAltitudeCoder
    Title goes here   Ever wanted a dbo.Split() function, but not had the time to debug it completely?  Let me guess - you are probably working on a stored procedure with 50 or more parameters; two or three of them are parameters of differing types, while the other 47 or so all of the same type (id1, id2, id3, id4, id5...).  Worse, you've found several other similar stored procedures with the ONLY DIFFERENCE being the number of like parameters taped to the end of the parameter list. If this is the situation you find yourself in now, you may be wondering, "why am I working with three different copies of what is basically the same stored procedure, and why am I having to maintain changes in three different places?  Can't I have one stored procedure that accomplishes the job of all three? My answer to you: YES!  Here is the Split() function I've created.    /******************************************************************************                                       Split.sql   ******************************************************************************/ /******************************************************************************   Split a delimited string into sub-components and return them as a table.   Parameter 1: Input string which is to be split into parts. Parameter 2: Delimiter which determines the split points in input string. Works with space or spaces as delimiter. Split() is apostrophe-safe.   SYNTAX: SELECT * FROM Split('Dvorak,Debussy,Chopin,Holst', ',') SELECT * FROM Split('Denver|Seattle|San Diego|New York', '|') SELECT * FROM Split('Denver is the super-awesomest city of them all.', ' ')   ******************************************************************************/ USE AdventureWorks GO   IF EXISTS       (SELECT *       FROM sysobjects       WHERE xtype = 'TF'       AND name = 'Split'       ) BEGIN       DROP FUNCTION Split END GO   CREATE FUNCTION Split (       @InputString                  VARCHAR(8000),       @Delimiter                    VARCHAR(50) )   RETURNS @Items TABLE (       Item                          VARCHAR(8000) )   AS BEGIN       IF @Delimiter = ' '       BEGIN             SET @Delimiter = ','             SET @InputString = REPLACE(@InputString, ' ', @Delimiter)       END         IF (@Delimiter IS NULL OR @Delimiter = '')             SET @Delimiter = ','   --INSERT INTO @Items VALUES (@Delimiter) -- Diagnostic --INSERT INTO @Items VALUES (@InputString) -- Diagnostic         DECLARE @Item                 VARCHAR(8000)       DECLARE @ItemList       VARCHAR(8000)       DECLARE @DelimIndex     INT         SET @ItemList = @InputString       SET @DelimIndex = CHARINDEX(@Delimiter, @ItemList, 0)       WHILE (@DelimIndex != 0)       BEGIN             SET @Item = SUBSTRING(@ItemList, 0, @DelimIndex)             INSERT INTO @Items VALUES (@Item)               -- Set @ItemList = @ItemList minus one less item             SET @ItemList = SUBSTRING(@ItemList, @DelimIndex+1, LEN(@ItemList)-@DelimIndex)             SET @DelimIndex = CHARINDEX(@Delimiter, @ItemList, 0)       END -- End WHILE         IF @Item IS NOT NULL -- At least one delimiter was encountered in @InputString       BEGIN             SET @Item = @ItemList             INSERT INTO @Items VALUES (@Item)       END         -- No delimiters were encountered in @InputString, so just return @InputString       ELSE INSERT INTO @Items VALUES (@InputString)         RETURN   END -- End Function GO   ---- Set Permissions --GRANT SELECT ON Split TO UserRole1 --GRANT SELECT ON Split TO UserRole2 --GO   The syntax is basically as follows: SELECT <fields> FROM Table 1 JOIN Table 2 ON ... JOIN Table 3 ON ... WHERE LOGICAL CONDITION A AND LOGICAL CONDITION B AND LOGICAL CONDITION C AND TABLE2.Id IN (SELECT * FROM Split(@IdList, ',')) @IdList is a parameter passed into the stored procedure, and the comma (',') is the delimiter you have chosen to split the parameter list on. You can also use it like this: SELECT <fields> FROM Table 1 JOIN Table 2 ON ... JOIN Table 3 ON ... WHERE LOGICAL CONDITION A AND LOGICAL CONDITION B AND LOGICAL CONDITION C HAVING COUNT(SELECT * FROM Split(@IdList, ',') Similarly, it can be used in other aggregate functions at run-time: SELECT MIN(SELECT * FROM Split(@IdList, ','), <fields> FROM Table 1 JOIN Table 2 ON ... JOIN Table 3 ON ... WHERE LOGICAL CONDITION A AND LOGICAL CONDITION B AND LOGICAL CONDITION C GROUP BY <fields> Now that I've (hopefully effectively) explained the benefits to using this function and implementing it in one or more of your database objects, let me warn you of a caveat that you are likely to encounter.  You may have a team member who waits until the right moment to ask you a pointed question: "Doesn't this function just do the same thing as using the IN function?  Why didn't you just use that instead?  In other words, why bother with this function?" What's happening is, one or more team members has failed to understand the reason for implementing this kind of function in the first place.  (Note: this is THE MOST IMPORTANT ASPECT OF THIS POST). Allow me to outline a few pros to implementing this function, so you may effectively parry this question.  Touche. 1) Code consolidation.  You don't have to maintain what is basically the same code and logic, but with varying numbers of the same parameter in several SQL objects.  I'm not going to go into the cons related to using this function, because the afore mentioned team member is probably more than adept at pointing these out.  Remember, the real positive contribution is ou are decreasing the liklihood that your team fails to update all (x) duplicate copies of what are basically the same stored procedure, and so on...  This is the classic downside to duplicate code.  It is a virus, and you should kill it. You might be better off rejecting your team member's question, and responding with your own: "Would you rather maintain the same logic in multiple different stored procedures, and hope that the team doesn't forget to always update all of them at the same time?".  In his head, he might be thinking "yes, I would like to maintain several different copies of the same stored procedure", although you probably will not get such a direct response.  2) Added flexibility - you can use the Split function elsewhere, and for splitting your data in different ways.  Plus, you can use any kind of delimiter you wish.  How can you know today the ways in which you might want to examine your data tomorrow?  Segue to my next point. 3) Because the function takes a delimiter parameter, you can split the data in any number of ways.  This greatly increases the utility of such a function and enables your team to work with the data in a variety of different ways in the future.  You can split on a single char, symbol, word, or group of words.  You can split on spaces.  (The list goes on... test it out). Finally, you can dynamically define the behavior of a stored procedure (or other SQL object) at run time, through the use of this function.  Rather than have several objects that accomplish almost the same thing, why not have only one instead?

    Read the article

  • Rockmelt, the technology adoption model, and Facebook's spare internet

    - by Roger Hart
    Regardless of how good it is, you'd have to have a heart of stone not to make snide remarks about Rockmelt. After all, on the surface it looks a lot like some people spent two years building a browser instead of just bashing out a Chrome extension over a wet weekend. It probably does some more stuff. I don't know for sure because artificial scarcity is cool, apparently, so the "invitation" is still in the post*. I may in fact never know for sure, because I'm not wild about Facebook sign-in as a prerequisite for anything. From the video, and some initial reviews, my early reaction was: I have a browser, I have a Twitter client; what on earth is this for? The answer, of course, is "not me". Rockmelt is, in a way, quite audacious. Oh, sure, on launch day it's Bay Area bar-chat for the kids with no lenses in their retro specs and trousers that give you deep-vein thrombosis, but it's not really about them. Likewise,  Facebook just launched Google Wave, or something. And all the tech snobbery and scorn packed into describing it that way is irrelevant next to what they're doing with their platform. Here's something I drew in MS Paint** because I don't want to get sued: (see: The technology adoption lifecycle) A while ago in the Guardian, John Lanchester dusted off the idiom that "technology is stuff that doesn't work yet". The rest of the article would be quite interesting if it wasn't largely about MySpace, and he's sort of got a point. If you bolt on the sentiment that risk-averse businessmen like things that work, you've got the essence of Crossing the Chasm. Products for the mainstream market don't look much like technology. Think for  a second about early (1980s ish) hi-fi systems, with all the knobs and fiddly bits, their ostentatious technophile aesthetic. Then consider their sleeker and less (or at least less conspicuously) functional successors in the 1990s. The theory goes that innovators and early adopters like technology, it's a hobby in itself. The rest of the humans seem to like magic boxes with very few buttons that make stuff happen and never trouble them about why. Personally, I consider Apple's maddening insistence that iTunes is an acceptable way to move files around to be more or less morally unacceptable. Most people couldn't care less. Hence Rockmelt, and hence Facebook's continued growth. Rockmelt looks pointless to me, because I aggregate my social gubbins with Digsby, or use TweetDeck. But my use case is different and so are my enthusiasms. If I want to share photos, I'll use Flickr - but Facebook has photo sharing. If I want a short broadcast message, I'll use Twitter - Facebook has status updates. If I want to sell something with relatively little hassle, there's eBay - or Facebook marketplace. YouTube - check, FB Video. Email - messaging. Calendaring apps, yeah there are loads, or FB Events. What if I want to host a simple web page? Sure, they've got pages. Also Notes for blogging, and more games than I can count. This stuff is right there, where millions and millions of users are already, and for what they need it just works. It's not about me, because I'm not in the big juicy area under the curve. It's what 1990s portal sites could never have dreamed of achieving. Facebook is AOL on speed, crack, and some designer drugs it had specially imported from the future. It's a n00b-friendly gateway to the internet that just happens to serve up all the things you want to do online, right where you are. Oh, and everybody else is there too. The price of having all this and the social graph too is that you have all of this, and the social graph too. But plenty of folks have more incisive things to say than me about the whole privacy shebang, and it's not really what I'm talking about. Facebook is maintaining a vast, and fairly fully-featured training-wheels internet. And it makes up a large proportion of the online experience for a lot of people***. It's the entire web (2.0?) experience for the early and late majority. And sure, no individual bit of it is quite as slick or as fully-realised as something like Flickr (which wows me a bit every time I use it. Those guys are good at the web), but it doesn't have to be. It has to be unobtrusively good enough for the regular humans. It has to not feel like technology. This is what Rockmelt sort of is. You're online, you want something nebulously social, and you don't want to faff about with, say, Twitter clients. Wow! There it is on a really distracting sidebar, right in your browser. No effort! Yeah - fish nor fowl, much? It might work, I guess. There may be a demographic who want their social web experience more simply than tech tinkering, and who aren't just getting it from Facebook (or, for that matter, mobile devices). But I'd be surprised. Rockmelt feels like an attempt to grab a slice of Facebook-style "Look! It's right here, where you already are!", but it's still asking the mature market to install a new browser. Presumably this is where that Facebook sign-in predicate comes in handy, though it'll take some potent awareness marketing to make it fly. Meanwhile, Facebook quietly has the entire rest of the internet as a product management resource, and can continue to give most of the people most of what they want. Something that has not gone un-noticed in its potential to look a little sinister. But heck, they might even make Google Wave popular.     *This was true last week when I drafted this post. I got an invite subsequently, hence the screenshot. **MS Paint is no fun any more. It's actually good in Windows 7. Farewell ironically-shonky diagrams. *** It's also behind a single sign-in, lending a veneer of confidence, and partially solving the problem of usernames being crummy unique identifiers. I'll be blogging about that at some point.

    Read the article

  • Thread placement policies on NUMA systems - update

    - by Dave
    In a prior blog entry I noted that Solaris used a "maximum dispersal" placement policy to assign nascent threads to their initial processors. The general idea is that threads should be placed as far away from each other as possible in the resource topology in order to reduce resource contention between concurrently running threads. This policy assumes that resource contention -- pipelines, memory channel contention, destructive interference in the shared caches, etc -- will likely outweigh (a) any potential communication benefits we might achieve by packing our threads more densely onto a subset of the NUMA nodes, and (b) benefits of NUMA affinity between memory allocated by one thread and accessed by other threads. We want our threads spread widely over the system and not packed together. Conceptually, when placing a new thread, the kernel picks the least loaded node NUMA node (the node with lowest aggregate load average), and then the least loaded core on that node, etc. Furthermore, the kernel places threads onto resources -- sockets, cores, pipelines, etc -- without regard to the thread's process membership. That is, initial placement is process-agnostic. Keep reading, though. This description is incorrect. On Solaris 10 on a SPARC T5440 with 4 x T2+ NUMA nodes, if the system is otherwise unloaded and we launch a process that creates 20 compute-bound concurrent threads, then typically we'll see a perfect balance with 5 threads on each node. We see similar behavior on an 8-node x86 x4800 system, where each node has 8 cores and each core is 2-way hyperthreaded. So far so good; this behavior seems in agreement with the policy I described in the 1st paragraph. I recently tried the same experiment on a 4-node T4-4 running Solaris 11. Both the T5440 and T4-4 are 4-node systems that expose 256 logical thread contexts. To my surprise, all 20 threads were placed onto just one NUMA node while the other 3 nodes remained completely idle. I checked the usual suspects such as processor sets inadvertently left around by colleagues, processors left offline, and power management policies, but the system was configured normally. I then launched multiple concurrent instances of the process, and, interestingly, all the threads from the 1st process landed on one node, all the threads from the 2nd process landed on another node, and so on. This happened even if I interleaved thread creating between the processes, so I was relatively sure the effect didn't related to thread creation time, but rather that placement was a function of process membership. I this point I consulted the Solaris sources and talked with folks in the Solaris group. The new Solaris 11 behavior is intentional. The kernel is no longer using a simple maximum dispersal policy, and thread placement is process membership-aware. Now, even if other nodes are completely unloaded, the kernel will still try to pack new threads onto the home lgroup (socket) of the primordial thread until the load average of that node reaches 50%, after which it will pick the next least loaded node as the process's new favorite node for placement. On the T4-4 we have 64 logical thread contexts (strands) per socket (lgroup), so if we launch 48 concurrent threads we will find 32 placed on one node and 16 on some other node. If we launch 64 threads we'll find 32 and 32. That means we can end up with our threads clustered on a small subset of the nodes in a way that's quite different that what we've seen on Solaris 10. So we have a policy that allows process-aware packing but reverts to spreading threads onto other nodes if a node becomes too saturated. It turns out this policy was enabled in Solaris 10, but certain bugs suppressed the mixed packing/spreading behavior. There are configuration variables in /etc/system that allow us to dial the affinity between nascent threads and their primordial thread up and down: see lgrp_expand_proc_thresh, specifically. In the OpenSolaris source code the key routine is mpo_update_tunables(). This method reads the /etc/system variables and sets up some global variables that will subsequently be used by the dispatcher, which calls lgrp_choose() in lgrp.c to place nascent threads. Lgrp_expand_proc_thresh controls how loaded an lgroup must be before we'll consider homing a process's threads to another lgroup. Tune this value lower to have it spread your process's threads out more. To recap, the 'new' policy is as follows. Threads from the same process are packed onto a subset of the strands of a socket (50% for T-series). Once that socket reaches the 50% threshold the kernel then picks another preferred socket for that process. Threads from unrelated processes are spread across sockets. More precisely, different processes may have different preferred sockets (lgroups). Beware that I've simplified and elided details for the purposes of explication. The truth is in the code. Remarks: It's worth noting that initial thread placement is just that. If there's a gross imbalance between the load on different nodes then the kernel will migrate threads to achieve a better and more even distribution over the set of available nodes. Once a thread runs and gains some affinity for a node, however, it becomes "stickier" under the assumption that the thread has residual cache residency on that node, and that memory allocated by that thread resides on that node given the default "first-touch" page-level NUMA allocation policy. Exactly how the various policies interact and which have precedence under what circumstances could the topic of a future blog entry. The scheduler is work-conserving. The x4800 mentioned above is an interesting system. Each of the 8 sockets houses an Intel 7500-series processor. Each processor has 3 coherent QPI links and the system is arranged as a glueless 8-socket twisted ladder "mobius" topology. Nodes are either 1 or 2 hops distant over the QPI links. As an aside the mapping of logical CPUIDs to physical resources is rather interesting on Solaris/x4800. On SPARC/Solaris the CPUID layout is strictly geographic, with the highest order bits identifying the socket, the next lower bits identifying the core within that socket, following by the pipeline (if present) and finally the logical thread context ("strand") on the core. But on Solaris on the x4800 the CPUID layout is as follows. [6:6] identifies the hyperthread on a core; bits [5:3] identify the socket, or package in Intel terminology; bits [2:0] identify the core within a socket. Such low-level details should be of interest only if you're binding threads -- a bad idea, the kernel typically handles placement best -- or if you're writing NUMA-aware code that's aware of the ambient placement and makes decisions accordingly. Solaris introduced the so-called critical-threads mechanism, which is expressed by putting a thread into the FX scheduling class at priority 60. The critical-threads mechanism applies to placement on cores, not on sockets, however. That is, it's an intra-socket policy, not an inter-socket policy. Solaris 11 introduces the Power Aware Dispatcher (PAD) which packs threads instead of spreading them out in an attempt to be able to keep sockets or cores at lower power levels. Maximum dispersal may be good for performance but is anathema to power management. PAD is off by default, but power management polices constitute yet another confounding factor with respect to scheduling and dispatching. If your threads communicate heavily -- one thread reads cache lines last written by some other thread -- then the new dense packing policy may improve performance by reducing traffic on the coherent interconnect. On the other hand if your threads in your process communicate rarely, then it's possible the new packing policy might result on contention on shared computing resources. Unfortunately there's no simple litmus test that says whether packing or spreading is optimal in a given situation. The answer varies by system load, application, number of threads, and platform hardware characteristics. Currently we don't have the necessary tools and sensoria to decide at runtime, so we're reduced to an empirical approach where we run trials and try to decide on a placement policy. The situation is quite frustrating. Relatedly, it's often hard to determine just the right level of concurrency to optimize throughput. (Understanding constructive vs destructive interference in the shared caches would be a good start. We could augment the lines with a small tag field indicating which strand last installed or accessed a line. Given that, we could augment the CPU with performance counters for misses where a thread evicts a line it installed vs misses where a thread displaces a line installed by some other thread.)

    Read the article

  • SQL SERVER – Core Concepts – Elasticity, Scalability and ACID Properties – Exploring NuoDB an Elastically Scalable Database System

    - by pinaldave
    I have been recently exploring Elasticity and Scalability attributes of databases. You can see that in my earlier blog posts about NuoDB where I wanted to look at Elasticity and Scalability concepts. The concepts are very interesting, and intriguing as well. I have discussed these concepts with my friend Joyti M and together we have come up with this interesting read. The goal of this article is to answer following simple questions What is Elasticity? What is Scalability? How ACID properties vary from NOSQL Concepts? What are the prevailing problems in the current database system architectures? Why is NuoDB  an innovative and welcome change in database paradigm? Elasticity This word’s original form is used in many different ways and honestly it does do a decent job in holding things together over the years as a person grows and contracts. Within the tech world, and specifically related to software systems (database, application servers), it has come to mean a few things - allow stretching of resources without reaching the breaking point (on demand). What are resources in this context? Resources are the usual suspects – RAM/CPU/IO/Bandwidth in the form of a container (a process or bunch of processes combined as modules). When it is about increasing resources the simplest idea which comes to mind is the addition of another container. Another container means adding a brand new physical node. When it is about adding a new node there are two questions which comes to mind. 1) Can we add another node to our software system? 2) If yes, does adding new node cause downtime for the system? Let us assume we have added new node, let us see what the new needs of the system are when a new node is added. Balancing incoming requests to multiple nodes Synchronization of a shared state across multiple nodes Identification of “downstate” and resolution action to bring it to “upstate” Well, adding a new node has its advantages as well. Here are few of the positive points Throughput can increase nearly horizontally across the node throughout the system Response times of application will increase as in-between layer interactions will be improved Now, Let us put the above concepts in the perspective of a Database. When we mention the term “running out of resources” or “application is bound to resources” the resources can be CPU, Memory or Bandwidth. The regular approach to “gain scalability” in the database is to look around for bottlenecks and increase the bottlenecked resource. When we have memory as a bottleneck we look at the data buffers, locks, query plans or indexes. After a point even this is not enough as there needs to be an efficient way of managing such large workload on a “single machine” across memory and CPU bound (right kind of scheduling)  workload. We next move on to either read/write separation of the workload or functionality-based sharing so that we still have control of the individual. But this requires lots of planning and change in client systems in terms of knowing where to go/update/read and for reporting applications to “aggregate the data” in an intelligent way. What we ideally need is an intelligent layer which allows us to do these things without us getting into managing, monitoring and distributing the workload. Scalability In the context of database/applications, scalability means three main things Ability to handle normal loads without pressure E.g. X users at the Y utilization of resources (CPU, Memory, Bandwidth) on the Z kind of hardware (4 processor, 32 GB machine with 15000 RPM SATA drives and 1 GHz Network switch) with T throughput Ability to scale up to expected peak load which is greater than normal load with acceptable response times Ability to provide acceptable response times across the system E.g. Response time in S milliseconds (or agreed upon unit of measure) – 90% of the time The Issue – Need of Scale In normal cases one can plan for the load testing to test out normal, peak, and stress scenarios to ensure specific hardware meets the needs. With help from Hardware and Software partners and best practices, bottlenecks can be identified and requisite resources added to the system. Unfortunately this vertical scale is expensive and difficult to achieve and most of the operational people need the ability to scale horizontally. This helps in getting better throughput as there are physical limits in terms of adding resources (Memory, CPU, Bandwidth and Storage) indefinitely. Today we have different options to achieve scalability: Read & Write Separation The idea here is to do actual writes to one store and configure slaves receiving the latest data with acceptable delays. Slaves can be used for balancing out reads. We can also explore functional separation or sharing as well. We can separate data operations by a specific identifier (e.g. region, year, month) and consolidate it for reporting purposes. For functional separation the major disadvantage is when schema changes or workload pattern changes. As the requirement grows one still needs to deal with scale need in manual ways by providing an abstraction in the middle tier code. Using NOSQL solutions The idea is to flatten out the structures in general to keep all values which are retrieved together at the same store and provide flexible schema. The issue with the stores is that they are compromising on mostly consistency (no ACID guarantees) and one has to use NON-SQL dialect to work with the store. The other major issue is about education with NOSQL solutions. Would one really want to make these compromises on the ability to connect and retrieve in simple SQL manner and learn other skill sets? Or for that matter give up on ACID guarantee and start dealing with consistency issues? Hybrid Deployment – Mac, Linux, Cloud, and Windows One of the challenges today that we see across On-premise vs Cloud infrastructure is a difference in abilities. Take for example SQL Azure – it is wonderful in its concepts of throttling (as it is shared deployment) of resources and ability to scale using federation. However, the same abilities are not available on premise. This is not a mistake, mind you – but a compromise of the sweet spot of workloads, customer requirements and operational SLAs which can be supported by the team. In today’s world it is imperative that databases are available across operating systems – which are a commodity and used by developers of all hues. An Ideal Database Ability List A system which allows a linear scale of the system (increase in throughput with reasonable response time) with the addition of resources A system which does not compromise on the ACID guarantees and require developers to learn new paradigms A system which does not force fit a new way interacting with database by learning Non-SQL dialect A system which does not force fit its mechanisms for providing availability across its various modules. Well NuoDB is the first database which has all of the above abilities and much more. In future articles I will cover my hands-on experience with it. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: NuoDB

    Read the article

< Previous Page | 24 25 26 27 28 29 30 31  | Next Page >