Search Results

Search found 74197 results on 2968 pages for 'part time'.

Page 34/2968 | < Previous Page | 30 31 32 33 34 35 36 37 38 39 40 41 | Next Page >

Antenna Aligner Part 6: Little Robots

- by Chris George

A week ago I took temporary ownership of a HTC Desire S so that I could start testing my app under Android. Support for Android was not in my original plan, but when Nomad added support for it recently, I starting thinking why not! So with some trepidation, I clicked the Build for Android button on the Nomad toolbar... nothing. Hmm... that's not right, I was expecting something to build. After a bit of faffing around I finally realised that I hadn't read the text on the Android setup page properly (yes that's right, RTFM!), and I needed a two-part application identifier, separated by a dot. I did this (not sure what the two part thing is all about, that one my list to investigate!) After making the change, the Android build worked and created the apk file. I uploaded this to the device and nervously ran it... it worked!!! Well, more or less! So, there was not splash screen, but this was no surprise because I only have the iOS icons and splash screen in my project at the moment. What was more concerning was the compass update didn't seem to be working. I suspect this is a result of using an iOS specific option in the Phonegap compass watcher. Another thing to investigate. I've also just noticed that the css gradient background hasn't worked either... These issues aside, it was actually more successful than I was expecting, so happy days! Right, lets get Googling... Next time: Preparing for submission to the App Store! :-)

Read the article
The Enterprise Side of JavaFX: Part Two

- by Janice J. Heiss

A new article, part of a three-part series, now up on the front page of otn/java, by Java Champion Adam Bien, titled “The Enterprise Side of JavaFX,” shows developers how to implement the LightView UI dashboard with JavaFX 2. Bien explains that “the RESTful back end of the LightView application comes with a rudimentary HTML page that is used to start/stop the monitoring service, set the snapshot interval, and activate/deactivate the GlassFish monitoring capabilities.”He explains that “the configuration view implemented in the org.lightview.view.Browser component is needed only to start or stop the monitoring process or set the monitoring interval.”Bien concludes his article with a general summary of the principles applied:“JavaFX encourages encapsulation without forcing you to build models for each visual component. With the availability of bindable properties, the boundary between the view and the model can be reduced to an expressive set of bindable properties. Wrapping JavaFX components with ordinary Java classes further reduces the complexity. Instead of dealing with low-level JavaFX mechanics all the time, you can build simple components and break down the complexity of the presentation logic into understandable pieces. CSS skinning further helps with the separation of the code that is needed for the implementation of the presentation logic and the visual appearance of the application on the screen. You can adjust significant portions of an application's look and feel directly in CSS files without touching the actual source code.”Check out the article here.

Read the article
OpenGL - Rendering from part of an index and vertex array depending on an element count

- by user1423893

I'm currently drawing my shapes as lines by using a VAO and then assigning the dynamic vertices and indices each frame. // Bind VAO glBindVertexArray(m_vao); // Update the vertex buffer with the new data (Copy data into the vertex buffer object) glBufferData(GL_ARRAY_BUFFER, numVertices * sizeof(VertexPosition), m_vertices.data(), GL_DYNAMIC_DRAW); // Update the index buffer with the new data (Copy data into the index buffer object) glBufferData(GL_ELEMENT_ARRAY_BUFFER, numIndices * sizeof(unsigned short), indices.data(), GL_DYNAMIC_DRAW); glDrawElements(GL_LINES, numIndices, GL_UNSIGNED_SHORT, BUFFER_OFFSET(0)); // Unbind VAO glBindVertexArray(0); What I would like to do is draw the lines using only part of the data stored in the index and vertex buffer objects. The vertex buffer has its vertices set from an array of defined maximum size: std::array<VertexPosition, maxVertices> m_vertices; The index buffer has its elements set from an array of defined maximum size: std::array<unsigned short, maxIndices> indices = { 0 }; A running total is kept of the number of vertices and indices needed for each draw call numVertices numIndices Can I not specify that the buffer data contain the entire array and only read from part of it when drawing? For example using the vertex buffer object glBufferData(GL_ARRAY_BUFFER, numVertices * sizeof(VertexPosition), m_vertices.data(), GL_DYNAMIC_DRAW); m_vertices.data() = Entire array is stored numVertices * sizeof(VertexPosition) = Amount of data to read from the entire array Is this not the correct way to approach this? I do not wish to use std::vector if possible.

Read the article
Google Analytics: Why is Avg Time on Site lower than Avg time on Page?

- by Melanie

I have the following Custom Report set up in Google Analytics: Metrics: Avg Time on Page Avg Time on Site Dimensions: Page So a report looks like this: Page Avg Time on Page Avg Time on Site /an-article 00:03:14 00:00:11 /another-article 00:05:11 00:01:07 /something-written 00:03:00 00:00:31 Why is it that for each 'page', the 'site views' are significantly lower?

Read the article
SharePoint 2013 Developer Ramp-Up - Part 2

As stated already yesterday, today I continued with the available course material on Pluralsight. For sure interesting topics in the second part of the series but not the field of operation I'm going to work in later. During the course you get a lot of information about how to create and deploy SharePoint Solutions and hosted SharePoint Apps. Today's resource(s) Apart from some blog articles I watched in the following course today: SharePoint 2013 Developer Ramp-Up - Part 2 - Developing SharePoint Solutions and Apps Not thrilling but still two solid hours to go. Takeaway One of the coolest aspects I figured out today is that SharePoint development can be done easily in JavaScript and C# - just as you like or prefer. It's actually pretty cool to see that you could integrate external JS libraries like datajs, knockout,js and so forth in order to implement your solution. And that you should be very familiar with Microsoft PowerShell. Not only to simplify some repetitive work but also to do be able to get things going in SharePoint. Having a decent background knowledge in Linux, I find this pretty amusing and remember the initial baby steps when PowerShell was introduced some years back (Note: German language). The outcry as well as the hype was too funny. Honestly, I have kind of mixed feelings about today's progress. Surely, there was interesting information about developing extensions directly for and in SharePoint... Hm, I'll leave that one for now and probably it might be helpful someday.

Read the article
Testing and Validation – You Really Do Have The Time

- by BuckWoody

One of the great advantages in my role as a Technical Specialist here at Microsoft is that I get to work with so many great clients. I get to see their environments and how they use them, and the way they work with SQL Server. I’ve been a data professional myself for many years. Over that time I’ve worked with many database platforms, lots of client applications, and written a lot of code in many industries. For a while I was also a consultant, so I got to see how other shops did things as well. But because I now focus on a “set” base of clients (over 500 professionals in over 150 companies) I get to see them over a longer period of time. Many of them help me understand how they use the product in their projects, and I even attend some DBA regular meetings. I see the way the product succeeds, and I see when it fails. Something that has really impacted my way of thinking is the level of importance any given shop is able to place on testing and validation. I’ve always been a big proponent of setting up a test system and following a very disciplined regimen to make sure it will work in production for any new projects, and then taking the lessons learned into production as standards. I know, I know – there’s never enough time to do things right like this. Yet the shops I see that do it have the same level of work that they output as the shops that don’t. They just make the time to do the testing and validation and create a standard that they will follow in production. And what I’ve found (surprise surprise) is that they have fewer production problems. OK, that might seem obvious – but I’ve actually tracked it and those places that do the testing and best practices really do save stress, time and trouble from that effort. We all think that’s a good idea, but we just “don’t have time”. OK – but from what I’m seeing, you can gain time if you spend a little up front. You may find that you’re actually already spending the same amount of time that you would spend in doing the testing, you’re just doing it later, at night, under the gun. Food for thought. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Do you charge a client for email and chat communication as a freelancer? [closed]

- by skyork

For a project that is billed by hours, should a freelancer charge the client for the amount of time he/she spends on email/chat correspondence? For example, the client sends an email to the the freelancer, outlining the requirements. Should the freelancer charge the client for the time during which he/she reads the email and writes a reply. The same goes for chat conversations for clarifying the requirements. In particular, if the freelancer's English is not very good, so that he/she spends extra time on understanding what the client wants and explaining him/herself (e.g. copying and pasting into Google Translate), should such time be charged to the client too?

Read the article
How do you make a precise countdown timer using clock_gettime? [migrated]

- by Joshun

Could somebody please explain how to make a countdown timer using clock_gettime, under Linux. I know you can use the clock() function to get cpu time, and multiply it by CLOCKS_PER_SEC to get actual time, but I'm told the clock() function is not well suited for this. So far I have attempted this (a billion is to pause for one second) #include <stdio.h> #include <time.h> #define BILLION 1000000000 int main() { struct timespec rawtime; clock_gettime(CLOCK_MONOTONIC_RAW, &rawtime); unsigned long int current = ( rawtime.tv_sec + rawtime.tv_nsec ); unsigned long int end = (( rawtime.tv_sec + rawtime.tv_nsec ) + BILLION ); while ( current < end ) { clock_gettime(CLOCK_MONOTONIC_RAW, &rawtime); current = ( rawtime.tv_sec + rawtime.tv_nsec ); } return 0; } I know this wouldn't be very useful on its own, but once I've found out how to time correctly I can use this in my projects. I know that sleep() can be used for this purpose, but I want to code the timer myself so that I can better integrate it in my projects - such as the possibility of it returning the time left, as opposed to pausing the whole program.

Read the article
How can I improve my real-time behavior in multi-threaded app using pthreads and condition variables

- by WilliamKF

I have a multi-threaded application that is using pthreads. I have a mutex() lock and condition variables(). There are two threads, one thread is producing data for the second thread, a worker, which is trying to process the produced data in a real time fashion such that one chuck is processed as close to the elapsing of a fixed time period as possible. This works pretty well, however, occasionally when the producer thread releases the condition upon which the worker is waiting, a delay of up to almost a whole second is seen before the worker thread gets control and executes again. I know this because right before the producer releases the condition upon which the worker is waiting, it does a chuck of processing for the worker if it is time to process another chuck, then immediately upon receiving the condition in the worker thread, it also does a chuck of processing if it is time to process another chuck. In this later case, I am seeing that I am late processing the chuck many times. I'd like to eliminate this lost efficiency and do what I can to keep the chucks ticking away as close to possible to the desired frequency. Is there anything I can do to reduce the delay between the release condition from the producer and the detection that that condition is released such that the worker resumes processing? For example, would it help for the producer to call something to force itself to be context switched out? Bottom line is the worker has to wait each time it asks the producer to create work for itself so that the producer can muck with the worker's data structures before telling the worker it is ready to run in parallel again. This period of exclusive access by the producer is meant to be short, but during this period, I am also checking for real-time work to be done by the producer on behalf of the worker while the producer has exclusive access. Somehow my hand off back to running in parallel again results in significant delay occasionally that I would like to avoid. Please suggest how this might be best accomplished.

Read the article
WebLogic Server Performance and Tuning: Part I - Tuning JVM

- by Gokhan Gungor

Each WebLogic Server instance runs in its own dedicated Java Virtual Machine (JVM) which is their runtime environment. Every Admin Server in any domain executes within a JVM. The same also applies for Managed Servers. WebLogic Server can be used for a wide variety of applications and services which uses the same runtime environment and resources. Oracle WebLogic ships with 2 different JVM, HotSpot and JRocket but you can choose which JVM you want to use. JVM is designed to optimize itself however it also provides some startup options to make small changes. There are default values for its memory and garbage collection. In real world, you will not want to stick with the default values provided by the JVM rather want to customize these values based on your applications which can produce large gains in performance by making small changes with the JVM parameters. We can tell the garbage collector how to delete garbage and we can also tell JVM how much space to allocate for each generation (of java Objects) or for heap. Remember during the garbage collection no other process is executed within the JVM or runtime, which is called STOP THE WORLD which can affect the overall throughput. Each JVM has its own memory segment called Heap Memory which is the storage for java Objects. These objects can be grouped based on their age like young generation (recently created objects) or old generation (surviving objects that have lived to some extent), etc. A java object is considered garbage when it can no longer be reached from anywhere in the running program. Each generation has its own memory segment within the heap. When this segment gets full, garbage collector deletes all the objects that are marked as garbage to create space. When the old generation space gets full, the JVM performs a major collection to remove the unused objects and reclaim their space. A major garbage collect takes a significant amount of time and can affect system performance. When we create a managed server either on the same machine or on remote machine it gets its initial startup parameters from $DOMAIN_HOME/bin/setDomainEnv.sh/cmd file. By default two parameters are set: Xms: The initial heapsize Xmx: The max heapsize Try to set equal initial and max heapsize. The startup time can be a little longer but for long running applications it will provide a better performance. When we set -Xms512m -Xmx1024m, the physical heap size will be 512m. This means that there are pages of memory (in the state of the 512m) that the JVM does not explicitly control. It will be controlled by OS which could be reserve for the other tasks. In this case, it is an advantage if the JVM claims the entire memory at once and try not to spend time to extend when more memory is needed. Also you can use -XX:MaxPermSize (Maximum size of the permanent generation) option for Sun JVM. You should adjust the size accordingly if your application dynamically load and unload a lot of classes in order to optimize the performance. You can set the JVM options/heap size from the following places: Through the Admin console, in the Server start tab In the startManagedWeblogic script for the managed servers $DOMAIN_HOME/bin/startManagedWebLogic.sh/cmd JAVA_OPTIONS="-Xms1024m -Xmx1024m" ${JAVA_OPTIONS} In the setDomainEnv script for the managed servers and admin server (domain wide) USER_MEM_ARGS="-Xms1024m -Xmx1024m" When there is free memory available in the heap but it is too fragmented and not contiguously located to store the object or when there is actually insufficient memory we can get java.lang.OutOfMemoryError. We should create Thread Dump and analyze if that is possible in case of such error. The second option we can use to produce higher throughput is to garbage collection. We can roughly divide GC algorithms into 2 categories: parallel and concurrent. Parallel GC stops the execution of all the application and performs the full GC, this generally provides better throughput but also high latency using all the CPU resources during GC. Concurrent GC on the other hand, produces low latency but also low throughput since it performs GC while application executes. The JRockit JVM provides some useful command-line parameters that to control of its GC scheme like -XgcPrio command-line parameter which takes the following options; XgcPrio:pausetime (To minimize latency, parallel GC) XgcPrio:throughput (To minimize throughput, concurrent GC ) XgcPrio:deterministic (To guarantee maximum pause time, for real time systems) Sun JVM has similar parameters (like -XX:UseParallelGC or -XX:+UseConcMarkSweepGC) to control its GC scheme. We can add -verbosegc -XX:+PrintGCDetails to monitor indications of a problem with garbage collection. Try configuring JVM’s of all managed servers to execute in -server mode to ensure that it is optimized for a server-side production environment.

Read the article
Using LogParser - part 2

- by fatherjack

PersonAddress.csv SalesOrderDetail.tsv In part 1 of this series we downloaded and installed LogParser and used it to list data from a csv file. That was a good start and in this article we are going to see the different ways we can stream data and choose whether a whole file is selected. We are also going to take a brief look at what file types we can interrogate. If we take the query from part 1 and add a value for the output parameter as -o:datagrid so that the query becomes LOGPARSER "SELECT top 15 * FROM C:\LP\person_address.csv" -o:datagrid and run that we get a different result. A pop-up dialog that lets us view the results in a resizable grid. Notice that because we didn't specify the columns we wanted returned by LogParser (we used SELECT *) is has added two columns to the recordset - filename and rownumber. This behaviour can be very useful as we will see in future parts of this series. You can click Next 10 rows or All rows or close the datagrid once you are finished reviewing the data. You may have noticed that the files that I am working with are different file types - one is a csv (comma separated values) and the other is a tsv (tab separated values). If you want to convert a file from one to another then LogParser makes it incredibly simple. Rather than using 'datagrid' as the value for the output parameter, use 'csv': logparser "SELECT SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, rowguid, ModifiedDate into C:\Sales_SalesOrderDetail.csv FROM C:\Sales_SalesOrderDetail.tsv" -i:tsv -o:csv Those familiar with SQL will not have to make a very big leap of faith to making adjustments to the above query to filter in/out records from the source file. Lets get all the records from the same file where the Order Quantity (OrderQty) is more than 25: logparser "SELECT SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, rowguid, ModifiedDate into C:\LP\Sales_SalesOrderDetailOver25.csv FROM C:\LP\Sales_SalesOrderDetail.tsv WHERE orderqty > 25" -i:tsv -o:csv Or we could find all those records where the Order Quantity is equal to 25 and output it to an xml file: logparser "SELECT SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, rowguid, ModifiedDate into C:\LP\Sales_SalesOrderDetailEq25.xml FROM C:\LP\Sales_SalesOrderDetail.tsv WHERE orderqty = 25" -i:tsv -o:xml All the standard comparison operators are to be found in LogParser; >, <, =, LIKE, BETWEEN, OR, NOT, AND. Input and Output file formats. LogParser has a pretty impressive list of file formats that it can parse and a good selection of output formats that will let you generate output in a format that is useable for whatever process or application you may be using. From any of these To any of these IISW3C: parses IIS log files in the W3C Extended Log File Format. NAT: formats output records as readable tabulated columns. IIS: parses IIS log files in the Microsoft IIS Log File Format. CSV: formats output records as comma-separated values text. BIN: parses IIS log files in the Centralized Binary Log File Format. TSV: formats output records as tab-separated or space-separated values text. IISODBC: returns database records from the tables logged to by IIS when configured to log in the ODBC Log Format. XML: formats output records as XML documents. HTTPERR: parses HTTP error log files generated by Http.sys. W3C: formats output records in the W3C Extended Log File Format. URLSCAN: parses log files generated by the URLScan IIS filter. TPL: formats output records following user-defined templates. CSV: parses comma-separated values text files. IIS: formats output records in the Microsoft IIS Log File Format. TSV: parses tab-separated and space-separated values text files. SQL: uploads output records to a table in a SQL database. XML: parses XML text files. SYSLOG: sends output records to a Syslog server. W3C: parses text files in the W3C Extended Log File Format. DATAGRID: displays output records in a graphical user interface. NCSA: parses web server log files in the NCSA Common, Combined, and Extended Log File Formats. CHART: creates image files containing charts. TEXTLINE: returns lines from generic text files. TEXTWORD: returns words from generic text files. EVT: returns events from the Windows Event Log and from Event Log backup files (.evt files). FS: returns information on files and directories. REG: returns information on registry values. ADS: returns information on Active Directory objects. NETMON: parses network capture files created by NetMon. ETW: parses Enterprise Tracing for Windows trace log files and live sessions. COM: provides an interface to Custom Input Format COM Plugins. So, you can query data from any of the types on the left and really easily get it into a format where it is ready for analysis by other tools. To a DBA or network Administrator with an enquiring mind this is a treasure trove. In part 3 we will look at working with multiple sources and specifically outputting to SQL format. See you there!

Read the article
Oracle BI Server Modeling, Part 1- Designing a Query Factory

- by bob.ertl(at)oracle.com

Welcome to Oracle BI Development's BI Foundation blog, focused on helping you get the most value from your Oracle Business Intelligence Enterprise Edition (BI EE) platform deployments. In my first series of posts, I plan to show developers the concepts and best practices for modeling in the Common Enterprise Information Model (CEIM), the semantic layer of Oracle BI EE. In this segment, I will lay the groundwork for the modeling concepts. First, I will cover the big picture of how the BI Server fits into the system, and how the CEIM controls the query processing. Oracle BI EE Query Cycle The purpose of the Oracle BI Server is to bridge the gap between the presentation services and the data sources. There are typically a variety of data sources in a variety of technologies: relational, normalized transaction systems; relational star-schema data warehouses and marts; multidimensional analytic cubes and financial applications; flat files, Excel files, XML files, and so on. Business datasets can reside in a single type of source, or, most of the time, are spread across various types of sources. Presentation services users are generally business people who need to be able to query that set of sources without any knowledge of technologies, schemas, or how sources are organized in their company. They think of business analysis in terms of measures with specific calculations, hierarchical dimensions for breaking those measures down, and detailed reports of the business transactions themselves. Most of them create queries without knowing it, by picking a dashboard page and some filters. Others create their own analysis by selecting metrics and dimensional attributes, and possibly creating additional calculations. The BI Server bridges that gap from simple business terms to technical physical queries by exposing just the business focused measures and dimensional attributes that business people can use in their analyses and dashboards. After they make their selections and start the analysis, the BI Server plans the best way to query the data sources, writes the optimized sequence of physical queries to those sources, post-processes the results, and presents them to the client as a single result set suitable for tables, pivots and charts. The CEIM is a model that controls the processing of the BI Server. It provides the subject areas that presentation services exposes for business users to select simplified metrics and dimensional attributes for their analysis. It models the mappings to the physical data access, the calculations and logical transformations, and the data access security rules. The CEIM consists of metadata stored in the repository, authored by developers using the Administration Tool client. Presentation services and other query clients create their queries in BI EE's SQL-92 language, called Logical SQL or LSQL. The API simply uses ODBC or JDBC to pass the query to the BI Server. Presentation services writes the LSQL query in terms of the simplified objects presented to the users. The BI Server creates a query plan, and rewrites the LSQL into fully-detailed SQL or other languages suitable for querying the physical sources. For example, the LSQL on the left below was rewritten into the physical SQL for an Oracle 11g database on the right. Logical SQL Physical SQL SELECT "D0 Time"."T02 Per Name Month" saw_0, "D4 Product"."P01 Product" saw_1, "F2 Units"."2-01 Billed Qty (Sum All)" saw_2 FROM "Sample Sales" ORDER BY saw_0, saw_1 WITH SAWITH0 AS ( select T986.Per_Name_Month as c1, T879.Prod_Dsc as c2, sum(T835.Units) as c3, T879.Prod_Key as c4 from Product T879 /* A05 Product */ , Time_Mth T986 /* A08 Time Mth */ , FactsRev T835 /* A11 Revenue (Billed Time Join) */ where ( T835.Prod_Key = T879.Prod_Key and T835.Bill_Mth = T986.Row_Wid) group by T879.Prod_Dsc, T879.Prod_Key, T986.Per_Name_Month ) select SAWITH0.c1 as c1, SAWITH0.c2 as c2, SAWITH0.c3 as c3 from SAWITH0 order by c1, c2 Probably everybody reading this blog can write SQL or MDX. However, the trick in designing the CEIM is that you are modeling a query-generation factory. Rather than hand-crafting individual queries, you model behavior and relationships, thus configuring the BI Server machinery to manufacture millions of different queries in response to random user requests. This mass production requires a different mindset and approach than when you are designing individual SQL statements in tools such as Oracle SQL Developer, Oracle Hyperion Interactive Reporting (formerly Brio), or Oracle BI Publisher. The Structure of the Common Enterprise Information Model (CEIM) The CEIM has a unique structure specifically for modeling the relationships and behaviors that fill the gap from logical user requests to physical data source queries and back to the result. The model divides the functionality into three specialized layers, called Presentation, Business Model and Mapping, and Physical, as shown below. Presentation services clients can generally only see the presentation layer, and the objects in the presentation layer are normally the only ones used in the LSQL request. When a request comes into the BI Server from presentation services or another client, the relationships and objects in the model allow the BI Server to select the appropriate data sources, create a query plan, and generate the physical queries. That's the left to right flow in the diagram below. When the results come back from the data source queries, the right to left relationships in the model show how to transform the results and perform any final calculations and functions that could not be pushed down to the databases. Business Model Think of the business model as the heart of the CEIM you are designing. This is where you define the analytic behavior seen by the users, and the superset library of metric and dimension objects available to the user community as a whole. It also provides the baseline business-friendly names and user-readable dictionary. For these reasons, it is often called the "logical" model--it is a virtual database schema that persists no data, but can be queried as if it is a database. The business model always has a dimensional shape (more on this in future posts), and its simple shape and terminology hides the complexity of the source data models. Besides hiding complexity and normalizing terminology, this layer adds most of the analytic value, as well. This is where you define the rich, dimensional behavior of the metrics and complex business calculations, as well as the conformed dimensions and hierarchies. It contributes to the ease of use for business users, since the dimensional metric definitions apply in any context of filters and drill-downs, and the conformed dimensions enable dashboard-wide filters and guided analysis links that bring context along from one page to the next. The conformed dimensions also provide a key to hiding the complexity of many sources, including federation of different databases, behind the simple business model. Note that the expression language in this layer is LSQL, so that any expression can be rewritten into any data source's query language at run time. This is important for federation, where a given logical object can map to several different physical objects in different databases. It is also important to portability of the CEIM to different database brands, which is a key requirement for Oracle's BI Applications products. Your requirements process with your user community will mostly affect the business model. This is where you will define most of the things they specifically ask for, such as metric definitions. For this reason, many of the best-practice methodologies of our consulting partners start with the high-level definition of this layer. Physical Model The physical model connects the business model that meets your users' requirements to the reality of the data sources you have available. In the query factory analogy, think of the physical layer as the bill of materials for generating physical queries. Every schema, table, column, join, cube, hierarchy, etc., that will appear in any physical query manufactured at run time must be modeled here at design time. Each physical data source will have its own physical model, or "database" object in the CEIM. The shape of each physical model matches the shape of its physical source. In other words, if the source is normalized relational, the physical model will mimic that normalized shape. If it is a hypercube, the physical model will have a hypercube shape. If it is a flat file, it will have a denormalized tabular shape. To aid in query optimization, the physical layer also tracks the specifics of the database brand and release. This allows the BI Server to make the most of each physical source's distinct capabilities, writing queries in its syntax, and using its specific functions. This allows the BI Server to push processing work as deep as possible into the physical source, which minimizes data movement and takes full advantage of the database's own optimizer. For most data sources, native APIs are used to further optimize performance and functionality. The value of having a distinct separation between the logical (business) and physical models is encapsulation of the physical characteristics. This encapsulation is another enabler of packaged BI applications and federation. It is also key to hiding the complex shapes and relationships in the physical sources from the end users. Consider a routine drill-down in the business model: physically, it can require a drill-through where the first query is MDX to a multidimensional cube, followed by the drill-down query in SQL to a normalized relational database. The only difference from the user's point of view is that the 2nd query added a more detailed dimension level column - everything else was the same. Mappings Within the Business Model and Mapping Layer, the mappings provide the binding from each logical column and join in the dimensional business model, to each of the objects that can provide its data in the physical layer. When there is more than one option for a physical source, rules in the mappings are applied to the query context to determine which of the data sources should be hit, and how to combine their results if more than one is used. These rules specify aggregate navigation, vertical partitioning (fragmentation), and horizontal partitioning, any of which can be federated across multiple, heterogeneous sources. These mappings are usually the most sophisticated part of the CEIM. Presentation You might think of the presentation layer as a set of very simple relational-like views into the business model. Over ODBC/JDBC, they present a relational catalog consisting of databases, tables and columns. For business users, presentation services interprets these as subject areas, folders and columns, respectively. (Note that in 10g, subject areas were called presentation catalogs in the CEIM. In this blog, I will stick to 11g terminology.) Generally speaking, presentation services and other clients can query only these objects (there are exceptions for certain clients such as BI Publisher and Essbase Studio). The purpose of the presentation layer is to specialize the business model for different categories of users. Based on a user's role, they will be restricted to specific subject areas, tables and columns for security. The breakdown of the model into multiple subject areas organizes the content for users, and subjects superfluous to a particular business role can be hidden from that set of users. Customized names and descriptions can be used to override the business model names for a specific audience. Variables in the object names can be used for localization. For these reasons, you are better off thinking of the tables in the presentation layer as folders than as strict relational tables. The real semantics of tables and how they function is in the business model, and any grouping of columns can be included in any table in the presentation layer. In 11g, an LSQL query can also span multiple presentation subject areas, as long as they map to the same business model. Other Model Objects There are some objects that apply to multiple layers. These include security-related objects, such as application roles, users, data filters, and query limits (governors). There are also variables you can use in parameters and expressions, and initialization blocks for loading their initial values on a static or user session basis. Finally, there are Multi-User Development (MUD) projects for developers to check out units of work, and objects for the marketing feature used by our packaged customer relationship management (CRM) software. The Query Factory At this point, you should have a grasp on the query factory concept. When developing the CEIM model, you are configuring the BI Server to automatically manufacture millions of queries in response to random user requests. You do this by defining the analytic behavior in the business model, mapping that to the physical data sources, and exposing it through the presentation layer's role-based subject areas. While configuring mass production requires a different mindset than when you hand-craft individual SQL or MDX statements, it builds on the modeling and query concepts you already understand. The following posts in this series will walk through the CEIM modeling concepts and best practices in detail. We will initially review dimensional concepts so you can understand the business model, and then present a pattern-based approach to learning the mappings from a variety of physical schema shapes and deployments to the dimensional model. Along the way, we will also present the dimensional calculation template, and learn how to configure the many additivity patterns.

Read the article
Microsoft Sql Server driver for Nodejs - Part 2

- by chanderdhall

Nodejs, Sql server and Json response with Rest This post is part 2 of Microsoft Sql Server driver for Node js.In this post we will look at the JSON responses from the Microsoft Sql Server driver for Node js. Pre-requisites: If you have read the Part 1 of the series, you should be good. We will be using a framework for Rest within Nodejs - Restify, but that would need no prior learning. Restify: Restify is a simple node module for building RESTful services. It is slimmer than Express. Express is a complete module that has all what you need to create a full-blown browser app. However, Restify does not have additional overhead of templating, rendering etc that would be needed if your app has views. So, as the name suggests it's an awesome framework for building RESTful services and is very light-weight. Set up - You can continue with the same directory or project structure we had in the previous post, or can start a new one. Install restify using npm and you are good to go. npm install restify Go to Server.js and include Restify in your solution. Then create the server object using restify.CreateServer() - SLICK - ha? var restify = require('restify'); var server = restify.createServer(); server.listen(8080, function () { console.log('%s listening at %s', server.name, server.url); }); Then make sure you provide a port for the Server to listen at. The call back function is optional but helps you for debugging purposes. Once you are done, save the file and then go to the command prompt and hit 'node server.js' and you should see the following: To test the server, go to your browser and type the address 'http://localhost:8080/' and oops you will see an error. Why is that? - Well because we haven't defined any routes. Let's go ahead and create a route. To begin with I'd like to return whatever is typed in the url after my name and the following code should do it. server.get('/ChanderDhall/:status', function respond(req, res, next) { res.end("hello " + req.params.name + "") }); You can also avoid writing call backs inline. Something like this. function respond(req, res, next) { res.end("Chander Dhall " + req.params.name + ""); } server.get('/hello/:name', respond); Now if you go ahead and type http://localhost:8080/ChanderDhall/LovesNode you will get the response 'Chander Dhall loves node'. NOTE: Make sure your url has the right case as it's case-sensitive. You could have also typed it in as 'server.get('/chanderdhall/:name', respond);' Stored procedure: We've talked a lot about Restify now, but keep in mind the post is about being able to use Sql server with Node and return JSON. To see this in action, let's go ahead and create another route to a list of Employees from a stored procedure. server.get('/Employees', Employees); The following code will return a JSON response. function Employees(req, res, next) { res.header("Content-Type: application/json"); //Need to specify the Content-Type which is //JSON in our case. sql.open(conn_str, function (err, conn) { if (err) { //Logs an error console.log("Error opening the database connection!"); return; } console.log("before query!"); conn.queryRaw("exec sp_GetEmployees", function (err, results) { if (err) { //Connection is open but an error occurs whileWhat else can be done? May be create a formatter or may be even come up with a hypermedia type but that may upset some pragmatists. Well, that's going to be a totally different discussion and is really not part of this series. Summary: We've discussed how to execute a stored procedure using Microsoft Sql Server driver for Node. Also, we have discussed how to format and send out a clean JSON to the app calling this API.

Read the article
Myths about Coding Craftsmanship part 2

- by tom

Myth 3: The source of all bad code is inept developers and stupid people When you review code is this what you assume? Shame on you. You are probably making assumptions in your code if you are assuming so much already. Bad code can be the result of any number of causes including but not limited to using dated techniques (like boxing when generics are available), not following standards (“look how he does the spacing between arguments!” or “did he really just name that variable ‘bln_Hello_Cats’?”), being redundant, using properties, methods, or objects in a novel way (like switching on button.Text between “Hello World” and “Hello World “ //clever use of space character… sigh), not following the SOLID principals, hacking around assumptions made in earlier iterations / hacking in features that should be worked into the overall design. The first two issues, while annoying are pretty easy to spot and can be fixed so easily. If your coding team is made up of experienced professionals who are passionate about staying current then these shouldn’t be happening. If you work with a variety of skills, backgrounds, and experience then there will be some of this stuff going on. If you have an opportunity to mentor such a developer who is receptive to constructive criticism don’t be a jerk; help them and the codebase will improve. A little patience can improve the codebase, your work environment, and even your perspective. The novelty and redundancy I have encountered has often been the use of creativity when language knowledge was perceived as unavailable or too time consuming. When developers learn on the job you get a lot of this. Rather than going to MSDN developers will use what they know. Depending on the constraints of their assignment hacking together what they know may seem quite practical. This was not stupid though I often wonder how much time is actually “saved” by hacking. These issues are often harder to untangle if we ever do. They can also grow out of control as we write hack after hack to make it work and get back to some development that is satisfying. Hacking upon an existing hack is what I call “feeding the monster”. Code monsters are anti-patterns and hacks gone wild. The reason code monsters continue to get bigger is that they keep growing in scope, touching more and more of the application. This is not the result of dumb developers. It is probably the result of avoiding design, not taking the time to understand the problems or anticipate or communicate the vision of the product. If our developers don’t understand the purpose of a feature or product how do we expect potential customers to do so? Forethought and organization are often what is missing from bad code. Developers who do not use the SOLID principals should be encouraged to learn these principals and be given guidance on how to apply them. The time “saved” by giving hackers room to hack will be made up for and then some. Not as technical debt but as shoddy work that if not replaced will be struggled with again and again. Bad code is not the result of dumb developers (usually) it is the result of trying to do too much without the proper resources and neglecting the right thing that needs doing with the first thoughtless thing that comes into our heads. Object oriented code is all about relationships between objects. Coders who believe their coworkers are all fools tend to write objects that are difficult to work with, not eager to explain themselves, and perform erratically and irrationally. If you constantly find you are surrounded by idiots you may want to ask yourself if you are being unreasonable, if you are being closed minded, of if you have chosen the right profession. Opening your mind up to the idea that you probably work with rational, well-intentioned people will probably make you a better coder and it might even make you less grumpy. If you are surrounded by jerks who do not engage in the exchange of ideas who do not care about their customers or the durability of the code you are building together then I suggest you find a new place to work. Myth 4: Customers don’t care about “beautiful” code Craftsmanship is customer focused because it means that the job was done right, the product will withstand the abuse, modifications, and scrutiny of our customers. Users can appreciate a predictable timeline for a release, a product delivered on time and on budget, a feature set that does not interfere with the task(s) it is supporting, quick turnarounds on exception messages, self healing issues, and less issues. These are all hindered by skimping on craftsmanship. When we write data access and when we write reusable code. What do you think? Does bad code come primarily from low IQ individuals? Do customers care about beautiful code?

Read the article
Rebuilding CoasterBuzz, Part IV: Dependency injection, it's what's for breakfast

- by Jeff

(Repost from my personal blog.) This is another post in a series about rebuilding one of my Web sites, which has been around for 12 years. I hope to relaunch soon. More: Part I: Evolution, and death to WCF Part II: Hot data objects Part III: The architecture using the "Web stack of love" If anything generally good for the craft has come out of the rise of ASP.NET MVC, it's that people are more likely to use dependency injection, and loosely couple the pieces parts of their applications. A lot of the emphasis on coding this way has been to facilitate unit testing, and that's awesome. Unit testing makes me feel a lot less like a hack, and a lot more confident in what I'm doing. Dependency injection is pretty straight forward. It says, "Given an instance of this class, I need instances of other classes, defined not by their concrete implementations, but their interfaces." Probably the first place a developer exercises this in when having a class talk to some kind of data repository. For a very simple example, pretend the FooService has to get some Foo. It looks like this: public class FooService { public FooService(IFooRepository fooRepo) { _fooRepo = fooRepo; } private readonly IFooRepository _fooRepo; public Foo GetMeFoo() { return _fooRepo.FooFromDatabase(); } } When we need the FooService, we ask the dependency container to get it for us. It says, "You'll need an IFooRepository in that, so let me see what that's mapped to, and put it in there for you." Why is this good for you? It's good because your FooService doesn't know or care about how you get some foo. You can stub out what the methods and properties on a fake IFooRepository might return, and test just the FooService. I don't want to get too far into unit testing, but it's the most commonly cited reason to use DI containers in MVC. What I wanted to mention is how there's another benefit in a project like mine, where I have to glue together a bunch of stuff. For example, when I have someone sign up for a new account on CoasterBuzz, I'm actually using POP Forums' new account mailer, which composes a bunch of text that includes a link to verify your account. The thing is, I want to use custom text and some other logic that's specific to CoasterBuzz. To accomplish this, I make a new class that inherits from the forum's NewAccountMailer, and override some stuff. Easy enough. Then I use Ninject, the DI container I'm using, to unbind the forum's implementation, and substitute my own. Ninject uses something called a NinjectModule to bind interfaces to concrete implementations. The forum has its own module, and then the CoasterBuzz module is loaded second. The CB module has two lines of code to swap out the mailer implementation: Unbind<PopForums.Email.INewAccountMailer>(); Bind<PopForums.Email.INewAccountMailer>().To<CbNewAccountMailer>(); Piece of cake! Now, when code asks the DI container for an INewAccountMailer, it gets my custom implementation instead. This is a lot easier to deal with than some of the alternatives. I could do some copy-paste, but then I'm not using well-tested code from the forum. I could write stuff from scratch, but then I'm throwing away a bunch of logic I've already written (in this case, stuff around e-mail, e-mail settings, mail delivery failures). There are other places where the DI container comes in handy. For example, CoasterBuzz does a number of custom things with user profiles, and special content for paid members. It uses the forum as the core piece to managing users, so I can ask the container to get me instances of classes that do user lookups, for example, and have zero care about how the forum handles database calls, configuration, etc. What a great world to live in, compared to ten years ago. Sure, the primary interest in DI is around the "separation of concerns" and facilitating unit testing, but as your library grows and you use more open source, it starts to be the glue that pulls everything together.

Read the article
Rebuilding CoasterBuzz, Part II: Hot data objects

- by Jeff

This is the second post, originally from my personal blog, in a series about rebuilding one of my Web sites, which has been around for 12 years. More: Part I: Evolution, and death to WCF After the rush to get moving on stuff, I temporarily lost interest. I went almost two weeks without touching the project, in part because the next thing on my backlog was doing up a bunch of administrative pages. So boring. Unfortunately, because most of the site's content is user-generated, you need some facilities for editing data. CoasterBuzz has a database full of amusement parks and roller coasters. The entities enjoy the relationships that you would expect, though they're further defined by "instances" of a coaster, to define one that has moved between parks as one, with different names and operational dates. And of course, there are pictures and news items, too. It's not horribly complex, except when you have to account for a name change and display just the newest name. In all previous versions, data access was straight SQL. As so much of the old code was rooted in 2003, with some changes in 2008, there wasn't much in the way of ORM frameworks going on then. Let me rephrase that, I mostly wasn't interested in ORM's. Since that time, I used a little LINQ to SQL in some projects, and a whole bunch of nHibernate while at Microsoft. Through all of that experience, I have to admit that these frameworks are often a bigger pain in the ass than not. They're great for basic crud operations, but when you start having all kinds of exotic relationships, they get difficult, and generate all kinds of weird SQL under the covers. The black box can quickly turn into a black hole. Sometimes you end up having to build all kinds of new expertise to do things "right" with a framework. Still, despite my reservations, I used the newer version of Entity Framework, with the "code first" modeling, in a science project and I really liked it. Since it's just a right-click away with NuGet, I figured I'd give it a shot here. My initial effort was spent defining the context class, which requires a bit of work because I deviate quite a bit from the conventions that EF uses, starting with table names. Then throw some partial querying of certain tables (where you'll find image data), and you're splitting tables across several objects (navigation properties). I won't go into the details, because these are all things that are well documented around the Internet, but there was a minor learning curve there. The basics of reading data using EF are fantastic. For example, a roller coaster object has a park associated with it, as well as a number of instances (if it was ever relocated), and there also might be a big banner image for it. This is stupid easy to use because it takes one line of code in your repository class, and by the time you pass it to the view, you have a rich object graph that has everything you need to display stuff. Likewise, editing simple data is also, well, simple. For this goodness, thank the ASP.NET MVC framework. The UpdateModel() method on the controllers is very elegant. Remember the old days of assigning all kinds of properties to objects in your Webforms code-behind? What a time consuming mess that used to be. Even if you're not using an ORM tool, having hydrated objects come off the wire is such a time saver. Not everything is easy, though. When you have to persist a complex graph of objects, particularly if they were composed in the user interface with all kinds of AJAX elements and list boxes, it's not just a simple matter of submitting the form. There were a few instances where I ended up going back to "old-fashioned" SQL just in the interest of time. It's not that I couldn't do what I needed with EF, it's just that the efficiency, both my own and that of the generated SQL, wasn't good. Since EF context objects expose a database connection object, you can use that to do the old school ADO.NET stuff you've done for a decade. Using various extension methods from POP Forums' data project, it was a breeze. You just have to stick to your decision, in this case. When you start messing with SQL directly, you can't go back in the same code to messing with entities because EF doesn't know what you're changing. Not really a big deal. There are a number of take-aways from using EF. The first is that you write a lot less code, which has always been a desired outcome of ORM's. The other lesson, and I particularly learned this the hard way working on the MSDN forums back in the day, is that trying to retrofit an ORM framework into an existing schema isn't fun at all. The CoasterBuzz database isn't bad, but there are design decisions I'd make differently if I were starting from scratch. Now that I have some of this stuff done, I feel like I can start to move on to the more interesting things on the backlog. There's a lot to do, but at least it's fun stuff, and not more forms that will be used infrequently.

Read the article
Creating a dynamic proxy generator – Part 1 – Creating the Assembly builder, Module builder and cach

- by SeanMcAlinden

I’ve recently started a project with a few mates to learn the ins and outs of Dependency Injection, AOP and a number of other pretty crucial patterns of development as we’ve all been using these patterns for a while but have relied totally on third part solutions to do the magic. We thought it would be interesting to really get into the details by rolling our own IoC container and hopefully learn a lot on the way, and you never know, we might even create an excellent framework. The open source project is called Rapid IoC and is hosted at http://rapidioc.codeplex.com/ One of the most interesting tasks for me is creating the dynamic proxy generator for enabling Aspect Orientated Programming (AOP). In this series of articles, I’m going to track each step I take for creating the dynamic proxy generator and I’ll try my best to explain what everything means - mainly as I’ll be using Reflection.Emit to emit a fair amount of intermediate language code (IL) to create the proxy types at runtime which can be a little taxing to read. It’s worth noting that building the proxy is without a doubt going to be slightly painful so I imagine there will be plenty of areas I’ll need to change along the way. Anyway lets get started… Part 1 - Creating the Assembly builder, Module builder and caching mechanism Part 1 is going to be a really nice simple start, I’m just going to start by creating the assembly, module and type caches. The reason we need to create caches for the assembly, module and types is simply to save the overhead of recreating proxy types that have already been generated, this will be one of the important steps to ensure that the framework is fast… kind of important as we’re calling the IoC container ‘Rapid’ – will be a little bit embarrassing if we manage to create the slowest framework. The Assembly builder The assembly builder is what is used to create an assembly at runtime, we’re going to have two overloads, one will be for the actual use of the proxy generator, the other will be mainly for testing purposes as it will also save the assembly so we can use Reflector to examine the code that has been created. Here’s the code: DynamicAssemblyBuilder using System; using System.Reflection; using System.Reflection.Emit; namespace Rapid.DynamicProxy.Assembly { /// <summary> /// Class for creating an assembly builder. /// </summary> internal static class DynamicAssemblyBuilder { #region Create /// <summary> /// Creates an assembly builder. /// </summary> /// <param name="assemblyName">Name of the assembly.</param> public static AssemblyBuilder Create(string assemblyName) { AssemblyName name = new AssemblyName(assemblyName); AssemblyBuilder assembly = AppDomain.CurrentDomain.DefineDynamicAssembly( name, AssemblyBuilderAccess.Run); DynamicAssemblyCache.Add(assembly); return assembly; } /// <summary> /// Creates an assembly builder and saves the assembly to the passed in location. /// </summary> /// <param name="assemblyName">Name of the assembly.</param> /// <param name="filePath">The file path.</param> public static AssemblyBuilder Create(string assemblyName, string filePath) { AssemblyName name = new AssemblyName(assemblyName); AssemblyBuilder assembly = AppDomain.CurrentDomain.DefineDynamicAssembly( name, AssemblyBuilderAccess.RunAndSave, filePath); DynamicAssemblyCache.Add(assembly); return assembly; } #endregion } } So hopefully the above class is fairly explanatory, an AssemblyName is created using the passed in string for the actual name of the assembly. An AssemblyBuilder is then constructed with the current AppDomain and depending on the overload used, it is either just run in the current context or it is set up ready for saving. It is then added to the cache. DynamicAssemblyCache using System.Reflection.Emit; using Rapid.DynamicProxy.Exceptions; using Rapid.DynamicProxy.Resources.Exceptions; namespace Rapid.DynamicProxy.Assembly { /// <summary> /// Cache for storing the dynamic assembly builder. /// </summary> internal static class DynamicAssemblyCache { #region Declarations private static object syncRoot = new object(); internal static AssemblyBuilder Cache = null; #endregion #region Adds a dynamic assembly to the cache. /// <summary> /// Adds a dynamic assembly builder to the cache. /// </summary> /// <param name="assemblyBuilder">The assembly builder.</param> public static void Add(AssemblyBuilder assemblyBuilder) { lock (syncRoot) { Cache = assemblyBuilder; } } #endregion #region Gets the cached assembly /// <summary> /// Gets the cached assembly builder. /// </summary> /// <returns></returns> public static AssemblyBuilder Get { get { lock (syncRoot) { if (Cache != null) { return Cache; } } throw new RapidDynamicProxyAssertionException(AssertionResources.NoAssemblyInCache); } } #endregion } } The cache is simply a static property that will store the AssemblyBuilder (I know it’s a little weird that I’ve made it public, this is for testing purposes, I know that’s a bad excuse but hey…) There are two methods for using the cache – Add and Get, these just provide thread safe access to the cache. The Module Builder The module builder is required as the create proxy classes will need to live inside a module within the assembly. Here’s the code: DynamicModuleBuilder using System.Reflection.Emit; using Rapid.DynamicProxy.Assembly; namespace Rapid.DynamicProxy.Module { /// <summary> /// Class for creating a module builder. /// </summary> internal static class DynamicModuleBuilder { /// <summary> /// Creates a module builder using the cached assembly. /// </summary> public static ModuleBuilder Create() { string assemblyName = DynamicAssemblyCache.Get.GetName().Name; ModuleBuilder moduleBuilder = DynamicAssemblyCache.Get.DefineDynamicModule (assemblyName, string.Format("{0}.dll", assemblyName)); DynamicModuleCache.Add(moduleBuilder); return moduleBuilder; } } } As you can see, the module builder is created on the assembly that lives in the DynamicAssemblyCache, the module is given the assembly name and also a string representing the filename if the assembly is to be saved. It is then added to the DynamicModuleCache. DynamicModuleCache using System.Reflection.Emit; using Rapid.DynamicProxy.Exceptions; using Rapid.DynamicProxy.Resources.Exceptions; namespace Rapid.DynamicProxy.Module { /// <summary> /// Class for storing the module builder. /// </summary> internal static class DynamicModuleCache { #region Declarations private static object syncRoot = new object(); internal static ModuleBuilder Cache = null; #endregion #region Add /// <summary> /// Adds a dynamic module builder to the cache. /// </summary> /// <param name="moduleBuilder">The module builder.</param> public static void Add(ModuleBuilder moduleBuilder) { lock (syncRoot) { Cache = moduleBuilder; } } #endregion #region Get /// <summary> /// Gets the cached module builder. /// </summary> /// <returns></returns> public static ModuleBuilder Get { get { lock (syncRoot) { if (Cache != null) { return Cache; } } throw new RapidDynamicProxyAssertionException(AssertionResources.NoModuleInCache); } } #endregion } } The DynamicModuleCache is very similar to the assembly cache, it is simply a statically stored module with thread safe Add and Get methods. The DynamicTypeCache To end off this post, I’m going to create the cache for storing the generated proxy classes. I’ve spent a fair amount of time thinking about the type of collection I should use to store the types and have finally decided that for the time being I’m going to use a generic dictionary. This may change when I can actually performance test the proxy generator but the time being I think it makes good sense in theory, mainly as it pretty much maintains it’s performance with varying numbers of items – almost constant (0)1. Plus I won’t ever need to loop through the items which is not the dictionaries strong point. Here’s the code as it currently stands: DynamicTypeCache using System; using System.Collections.Generic; using System.Security.Cryptography; using System.Text; namespace Rapid.DynamicProxy.Types { /// <summary> /// Cache for storing proxy types. /// </summary> internal static class DynamicTypeCache { #region Declarations static object syncRoot = new object(); public static Dictionary<string, Type> Cache = new Dictionary<string, Type>(); #endregion /// <summary> /// Adds a proxy to the type cache. /// </summary> /// <param name="type">The type.</param> /// <param name="proxy">The proxy.</param> public static void AddProxyForType(Type type, Type proxy) { lock (syncRoot) { Cache.Add(GetHashCode(type.AssemblyQualifiedName), proxy); } } /// <summary> /// Tries the type of the get proxy for. /// </summary> /// <param name="type">The type.</param> /// <returns></returns> public static Type TryGetProxyForType(Type type) { lock (syncRoot) { Type proxyType; Cache.TryGetValue(GetHashCode(type.AssemblyQualifiedName), out proxyType); return proxyType; } } #region Private Methods private static string GetHashCode(string fullName) { SHA1CryptoServiceProvider provider = new SHA1CryptoServiceProvider(); Byte[] buffer = Encoding.UTF8.GetBytes(fullName); Byte[] hash = provider.ComputeHash(buffer, 0, buffer.Length); return Convert.ToBase64String(hash); } #endregion } } As you can see, there are two public methods, one for adding to the cache and one for getting from the cache. Hopefully they should be clear enough, the Get is a TryGet as I do not want the dictionary to throw an exception if a proxy doesn’t exist within the cache. Other than that I’ve decided to create a key using the SHA1CryptoServiceProvider, this may change but my initial though is the SHA1 algorithm is pretty fast to put together using the provider and it is also very unlikely to have any hashing collisions. (there are some maths behind how unlikely this is – here’s the wiki if you’re interested http://en.wikipedia.org/wiki/SHA_hash_functions) Anyway, that’s the end of part 1 – although I haven’t started any of the fun stuff (by fun I mean hairpulling, teeth grating Relfection.Emit style fun), I’ve got the basis of the DynamicProxy in place so all we have to worry about now is creating the types, interceptor classes, method invocation information classes and finally a really nice fluent interface that will abstract all of the hard-core craziness away and leave us with a lightning fast, easy to use AOP framework. Hope you find the series interesting. All of the source code can be viewed and/or downloaded at our codeplex site - http://rapidioc.codeplex.com/ Kind Regards, Sean.

Read the article
Using the StopWatch class to calculate the execution time of a block of code

- by vik20000in

Many of the times while doing the performance tuning of some, class, webpage, component, control etc. we first measure the current time taken in the execution of that code. This helps in understanding the location in code which is actually causing the performance issue and also help in measuring the amount of improvement by making the changes. This measurement is very important as it helps us understand the problem in code, Helps us to write better code next time (as we have already learnt what kind of improvement can be made with different code) . Normally developers create 2 objects of the DateTime class. The exact time is collected before and after the code where the performance needs to be measured. Next the difference between the two objects is used to know about the time spent in the code that is measured. Below is an example of the sample code. DateTime dt1, dt2; dt1 = DateTime.Now; for (int i = 0; i < 1000000; i++) { string str = "string"; } dt2 = DateTime.Now; TimeSpan ts = dt2.Subtract(dt1); Console.WriteLine("Time Spent : " + ts.TotalMilliseconds.ToString()); The above code works great. But the dot net framework also provides for another way to capture the time spent on the code without doing much effort (creating 2 datetime object, timespan object etc..). We can use the inbuilt StopWatch class to get the exact time spent. Below is an example of the same work with the help of the StopWatch class. Stopwatch sw = Stopwatch.StartNew(); for (int i = 0; i < 1000000; i++) { string str = "string"; } sw.Stop(); Console.WriteLine("Time Spent : " +sw.Elapsed.TotalMilliseconds.ToString()); [Note the StopWatch class resides in the System.Diagnostics namespace] If you use the StopWatch class the time taken for measuring the performance is much better, with very little effort. Vikram

Read the article
Parallelism in .NET – Part 11, Divide and Conquer via Parallel.Invoke

- by Reed

Many algorithms are easily written to work via recursion. For example, most data-oriented tasks where a tree of data must be processed are much more easily handled by starting at the root, and recursively “walking” the tree. Some algorithms work this way on flat data structures, such as arrays, as well. This is a form of divide and conquer: an algorithm design which is based around breaking up a set of work recursively, “dividing” the total work in each recursive step, and “conquering” the work when the remaining work is small enough to be solved easily. Recursive algorithms, especially ones based on a form of divide and conquer, are often a very good candidate for parallelization. This is apparent from a common sense standpoint. Since we’re dividing up the total work in the algorithm, we have an obvious, built-in partitioning scheme. Once partitioned, the data can be worked upon independently, so there is good, clean isolation of data. Implementing this type of algorithm is fairly simple. The Parallel class in .NET 4 includes a method suited for this type of operation: Parallel.Invoke. This method works by taking any number of delegates defined as an Action, and operating them all in parallel. The method returns when every delegate has completed: Parallel.Invoke( () => { Console.WriteLine("Action 1 executing in thread {0}", Thread.CurrentThread.ManagedThreadId); }, () => { Console.WriteLine("Action 2 executing in thread {0}", Thread.CurrentThread.ManagedThreadId); }, () => { Console.WriteLine("Action 3 executing in thread {0}", Thread.CurrentThread.ManagedThreadId); } ); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Running this simple example demonstrates the ease of using this method. For example, on my system, I get three separate thread IDs when running the above code. By allowing any number of delegates to be executed directly, concurrently, the Parallel.Invoke method provides us an easy way to parallelize any algorithm based on divide and conquer. We can divide our work in each step, and execute each task in parallel, recursively. For example, suppose we wanted to implement our own quicksort routine. The quicksort algorithm can be designed based on divide and conquer. In each iteration, we pick a pivot point, and use that to partition the total array. We swap the elements around the pivot, then recursively sort the lists on each side of the pivot. For example, let’s look at this simple, sequential implementation of quicksort: public static void QuickSort<T>(T[] array) where T : IComparable<T> { QuickSortInternal(array, 0, array.Length - 1); } private static void QuickSortInternal<T>(T[] array, int left, int right) where T : IComparable<T> { if (left >= right) { return; } SwapElements(array, left, (left + right) / 2); int last = left; for (int current = left + 1; current <= right; ++current) { if (array[current].CompareTo(array[left]) < 0) { ++last; SwapElements(array, last, current); } } SwapElements(array, left, last); QuickSortInternal(array, left, last - 1); QuickSortInternal(array, last + 1, right); } static void SwapElements<T>(T[] array, int i, int j) { T temp = array[i]; array[i] = array[j]; array[j] = temp; } Here, we implement the quicksort algorithm in a very common, divide and conquer approach. Running this against the built-in Array.Sort routine shows that we get the exact same answers (although the framework’s sort routine is slightly faster). On my system, for example, I can use framework’s sort to sort ten million random doubles in about 7.3s, and this implementation takes about 9.3s on average. Looking at this routine, though, there is a clear opportunity to parallelize. At the end of QuickSortInternal, we recursively call into QuickSortInternal with each partition of the array after the pivot is chosen. This can be rewritten to use Parallel.Invoke by simply changing it to: // Code above is unchanged... SwapElements(array, left, last); Parallel.Invoke( () => QuickSortInternal(array, left, last - 1), () => QuickSortInternal(array, last + 1, right) ); } This routine will now run in parallel. When executing, we now see the CPU usage across all cores spike while it executes. However, there is a significant problem here – by parallelizing this routine, we took it from an execution time of 9.3s to an execution time of approximately 14 seconds! We’re using more resources as seen in the CPU usage, but the overall result is a dramatic slowdown in overall processing time. This occurs because parallelization adds overhead. Each time we split this array, we spawn two new tasks to parallelize this algorithm! This is far, far too many tasks for our cores to operate upon at a single time. In effect, we’re “over-parallelizing” this routine. This is a common problem when working with divide and conquer algorithms, and leads to an important observation: When parallelizing a recursive routine, take special care not to add more tasks than necessary to fully utilize your system. This can be done with a few different approaches, in this case. Typically, the way to handle this is to stop parallelizing the routine at a certain point, and revert back to the serial approach. Since the first few recursions will all still be parallelized, our “deeper” recursive tasks will be running in parallel, and can take full advantage of the machine. This also dramatically reduces the overhead added by parallelizing, since we’re only adding overhead for the first few recursive calls. There are two basic approaches we can take here. The first approach would be to look at the total work size, and if it’s smaller than a specific threshold, revert to our serial implementation. In this case, we could just check right-left, and if it’s under a threshold, call the methods directly instead of using Parallel.Invoke. The second approach is to track how “deep” in the “tree” we are currently at, and if we are below some number of levels, stop parallelizing. This approach is a more general-purpose approach, since it works on routines which parse trees as well as routines working off of a single array, but may not work as well if a poor partitioning strategy is chosen or the tree is not balanced evenly. This can be written very easily. If we pass a maxDepth parameter into our internal routine, we can restrict the amount of times we parallelize by changing the recursive call to: // Code above is unchanged... SwapElements(array, left, last); if (maxDepth < 1) { QuickSortInternal(array, left, last - 1, maxDepth); QuickSortInternal(array, last + 1, right, maxDepth); } else { --maxDepth; Parallel.Invoke( () => QuickSortInternal(array, left, last - 1, maxDepth), () => QuickSortInternal(array, last + 1, right, maxDepth)); } We no longer allow this to parallelize indefinitely – only to a specific depth, at which time we revert to a serial implementation. By starting the routine with a maxDepth equal to Environment.ProcessorCount, we can restrict the total amount of parallel operations significantly, but still provide adequate work for each processing core. With this final change, my timings are much better. On average, I get the following timings: Framework via Array.Sort: 7.3 seconds Serial Quicksort Implementation: 9.3 seconds Naive Parallel Implementation: 14 seconds Parallel Implementation Restricting Depth: 4.7 seconds Finally, we are now faster than the framework’s Array.Sort implementation.

Read the article
Database model for keeping track of likes/shares/comments on blog posts over time

- by gage

My goal is to keep track of the popular posts on different blog sites based on social network activity at any given time. The goal is not to simply get the most popular now, but instead find posts that are popular compared to other posts on the same blog. For example, I follow a tech blog, a sports blog, and a gossip blog. The tech blog gets waaay more readership than the other two blogs, so in raw numbers every post on the tech blog will always out number views on the other two. So lets say the average tech blog post gets 500 facebook likes and the other two get an average of 50 likes per post. Then when there is a sports blog post that has 200 fb likes and a gossip blog post with 300 while the tech blog posts today have 500 likes I want to highlight the sports and gossip blog posts (more likes than average vs tech blog with more # of likes but just average for the blog) The approach I am thinking of taking is to make an entry in a database for each blog post. Every x minutes (say every 15 minutes) I will check how many likes/shares/comments an entry has received on all the social networks (facebook, twitter, google+, linkeIn). So over time there will be a history of likes for each blog post, i.e post 1234 after 15 min: 10 fb likes, 4 tweets, 6 g+ after 30 min: 15 fb likes, 15 tweets, 10 g+ ... ... after 48 hours: 200 fb likes, 25 tweets, 15 g+ By keeping a history like this for each blog post I can know the average number of likes/shares/tweets at any give time interval. So for example the average number of fb likes for all blog posts 48hrs after posting is 50, and a particular post has 200 I can mark that as a popular post and feature/highlight it. A consideration in the design is to be able to easily query the values (likes/shares) for a specific time-frame, i.e. fb likes after 30min or tweets after 24 hrs in-order to compute averages with which to compare against (or should averages be stored in it's own table?) If this approach is flawed or could use improvement please let me know, but it is not my main question. My main question is what should a database scheme for storing this info look like? Assuming that the above approach is taken I am trying to figure out what a database schema for storing the likes over time would look like. I am brand new to databases, in doing some basic reading I see that it is advisable to make a 3NF database. I have come up with the following possible schema. Schema 1 DB Popular Posts Table: Post post_id ( primary key(pk) ) url title Table: Social Activity activity_id (pk) url (fk) type (i.e. facebook,twitter,g+) value timestamp This was my initial instinct (base on my very limited db knowledge). As far as I under stand this schema would be 3NF? I searched for designs of similar database model, and found this question on stackoverflow, http://stackoverflow.com/questions/11216080/data-structure-for-storing-height-and-weight-etc-over-time-for-multiple-users . The scenario in that question is similar (recording weight/height of users overtime). Taking the accepted answer for that question and applying it to my model results in something like: Schema 2 (same as above, but break down the social activity into 2 tables) DB Popular Posts Table: Post post_id (pk) url title Table: Social Measurement measurement_id (pk) post_id (fk) timestamp Table: Social stat stat_id (pk) measurement_id (fk) type (i.e. facebook,twitter,g+) value The advantage I see in schema 2 is that I will likely want to access all the values for a given time, i.e. when making a measurement at 30min after a post is published I will simultaneous check number of fb likes, fb shares, fb comments, tweets, g+, linkedIn. So with this schema it may be easier get get all stats for a measurement_id corresponding to a certain time, i.e. all social stats for post 1234 at time x. Another thought I had is since it doesn't make sense to compare number of fb likes with number of tweets or g+ shares, maybe it makes sense to separate each social measurement into it's own table? Schema 3 DB Popular Posts Table: Post post_id (pk) url title Table: fb_likes fb_like_id (pk) post_id (fk) timestamp value Table: fb_shares fb_shares_id (pk) post_id (fk) timestamp value Table: tweets tweets__id (pk) post_id (fk) timestamp value Table: google_plus google_plus_id (pk) post_id (fk) timestamp value As you can see I am generally lost/unsure of what approach to take. I'm sure this typical type of database problem (storing measurements overtime, i.e temperature statistic) that must have a common solution. Is there a design pattern/model for this, does it have a name? I tried searching for "database periodic data collection" or "database measurements over time" but didn't find anything specific. What would be an appropriate model to solve the needs of this problem?

Read the article
Autoscaling in a modern world…. Part 1

- by Steve Loethen

It has been a while since I have had time to sit down and blog. I need to make sure I take the time. It helps me to focus on technology and not let the administrivia keep me from doing the things I love. I have been focusing on the cloud for the last couple of years. Specifically the PaaS platform from Microsoft called Azure. Time to dig in.. I wanted to explore Autoscaling. Autoscaling is not native part of Azure. The platform has the needed connection points. You can write code that looks at the health and performance of your application components and react to needed scaling changes. But that means you have to write all the code. Luckily, an add on to the Enterprise Library provides a lot of code that gets you a long way to being able to autoscale without having to start from scratch. The tool set is primarily composed of a Autoscaler object that you need to host. This object, when hosted and configured, looks at the performance criteria you specify and adjusts your application based on your needs. Sounds perfect. I started with the a set of HOL’s that gave me a good basis to understand the mechanics. I worked through labs 1 and 2 just to get the feel, but let’s start our saga at the end of lab3. Lab3 end results in a web application, hosted in Azure and a console app running on premise. The web app has a few buttons on it. One set adds messages to a queue, another removes them. A second set of buttons drives processor utilization to 100%. If you want to guess, a safe bet is that the Autoscaler is configured to react to a queue that has filled up or high cpu usage. We will continue our saga in the next post…

Read the article
Fraud Detection with the SQL Server Suite Part 1

- by Dejan Sarka

While working on different fraud detection projects, I developed my own approach to the solution for this problem. In my PASS Summit 2013 session I am introducing this approach. I also wrote a whitepaper on the same topic, which was generously reviewed by my friend Matija Lah. In order to spread this knowledge faster, I am starting a series of blog posts which will at the end make the whole whitepaper. Abstract With the massive usage of credit cards and web applications for banking and payment processing, the number of fraudulent transactions is growing rapidly and on a global scale. Several fraud detection algorithms are available within a variety of different products. In this paper, we focus on using the Microsoft SQL Server suite for this purpose. In addition, we will explain our original approach to solving the problem by introducing a continuous learning procedure. Our preferred type of service is mentoring; it allows us to perform the work and consulting together with transferring the knowledge onto the customer, thus making it possible for a customer to continue to learn independently. This paper is based on practical experience with different projects covering online banking and credit card usage. Introduction A fraud is a criminal or deceptive activity with the intention of achieving financial or some other gain. Fraud can appear in multiple business areas. You can find a detailed overview of the business domains where fraud can take place in Sahin Y., & Duman E. (2011), Detecting Credit Card Fraud by Decision Trees and Support Vector Machines, Proceedings of the International MultiConference of Engineers and Computer Scientists 2011 Vol 1. Hong Kong: IMECS. Dealing with frauds includes fraud prevention and fraud detection. Fraud prevention is a proactive mechanism, which tries to disable frauds by using previous knowledge. Fraud detection is a reactive mechanism with the goal of detecting suspicious behavior when a fraudster surpasses the fraud prevention mechanism. A fraud detection mechanism checks every transaction and assigns a weight in terms of probability between 0 and 1 that represents a score for evaluating whether a transaction is fraudulent or not. A fraud detection mechanism cannot detect frauds with a probability of 100%; therefore, manual transaction checking must also be available. With fraud detection, this manual part can focus on the most suspicious transactions. This way, an unchanged number of supervisors can detect significantly more frauds than could be achieved with traditional methods of selecting which transactions to check, for example with random sampling. There are two principal data mining techniques available both in general data mining as well as in specific fraud detection techniques: supervised or directed and unsupervised or undirected. Supervised techniques or data mining models use previous knowledge. Typically, existing transactions are marked with a flag denoting whether a particular transaction is fraudulent or not. Customers at some point in time do report frauds, and the transactional system should be capable of accepting such a flag. Supervised data mining algorithms try to explain the value of this flag by using different input variables. When the patterns and rules that lead to frauds are learned through the model training process, they can be used for prediction of the fraud flag on new incoming transactions. Unsupervised techniques analyze data without prior knowledge, without the fraud flag; they try to find transactions which do not resemble other transactions, i.e. outliers. In both cases, there should be more frauds in the data set selected for checking by using the data mining knowledge compared to selecting the data set with simpler methods; this is known as the lift of a model. Typically, we compare the lift with random sampling. The supervised methods typically give a much better lift than the unsupervised ones. However, we must use the unsupervised ones when we do not have any previous knowledge. Furthermore, unsupervised methods are useful for controlling whether the supervised models are still efficient. Accuracy of the predictions drops over time. Patterns of credit card usage, for example, change over time. In addition, fraudsters continuously learn as well. Therefore, it is important to check the efficiency of the predictive models with the undirected ones. When the difference between the lift of the supervised models and the lift of the unsupervised models drops, it is time to refine the supervised models. However, the unsupervised models can become obsolete as well. It is also important to measure the overall efficiency of both, supervised and unsupervised models, over time. We can compare the number of predicted frauds with the total number of frauds that include predicted and reported occurrences. For measuring behavior across time, specific analytical databases called data warehouses (DW) and on-line analytical processing (OLAP) systems can be employed. By controlling the supervised models with unsupervised ones and by using an OLAP system or DW reports to control both, a continuous learning infrastructure can be established. There are many difficulties in developing a fraud detection system. As has already been mentioned, fraudsters continuously learn, and the patterns change. The exchange of experiences and ideas can be very limited due to privacy concerns. In addition, both data sets and results might be censored, as the companies generally do not want to publically expose actual fraudulent behaviors. Therefore it can be quite difficult if not impossible to cross-evaluate the models using data from different companies and different business areas. This fact stresses the importance of continuous learning even more. Finally, the number of frauds in the total number of transactions is small, typically much less than 1% of transactions is fraudulent. Some predictive data mining algorithms do not give good results when the target state is represented with a very low frequency. Data preparation techniques like oversampling and undersampling can help overcome the shortcomings of many algorithms. SQL Server suite includes all of the software required to create, deploy any maintain a fraud detection infrastructure. The Database Engine is the relational database management system (RDBMS), which supports all activity needed for data preparation and for data warehouses. SQL Server Analysis Services (SSAS) supports OLAP and data mining (in version 2012, you need to install SSAS in multidimensional and data mining mode; this was the only mode in previous versions of SSAS, while SSAS 2012 also supports the tabular mode, which does not include data mining). Additional products from the suite can be useful as well. SQL Server Integration Services (SSIS) is a tool for developing extract transform–load (ETL) applications. SSIS is typically used for loading a DW, and in addition, it can use SSAS data mining models for building intelligent data flows. SQL Server Reporting Services (SSRS) is useful for presenting the results in a variety of reports. Data Quality Services (DQS) mitigate the occasional data cleansing process by maintaining a knowledge base. Master Data Services is an application that helps companies maintaining a central, authoritative source of their master data, i.e. the most important data to any organization. For an overview of the SQL Server business intelligence (BI) part of the suite that includes Database Engine, SSAS and SSRS, please refer to Veerman E., Lachev T., & Sarka D. (2009). MCTS Self-Paced Training Kit (Exam 70-448): Microsoft® SQL Server® 2008 Business Intelligence Development and Maintenance. MS Press. For an overview of the enterprise information management (EIM) part that includes SSIS, DQS and MDS, please refer to Sarka D., Lah M., & Jerkic G. (2012). Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft® SQL Server® 2012. O'Reilly. For details about SSAS data mining, please refer to MacLennan J., Tang Z., & Crivat B. (2009). Data Mining with Microsoft SQL Server 2008. Wiley. SQL Server Data Mining Add-ins for Office, a free download for Office versions 2007, 2010 and 2013, bring the power of data mining to Excel, enabling advanced analytics in Excel. Together with PowerPivot for Excel, which is also freely downloadable and can be used in Excel 2010, is already included in Excel 2013. It brings OLAP functionalities directly into Excel, making it possible for an advanced analyst to build a complete learning infrastructure using a familiar tool. This way, many more people, including employees in subsidiaries, can contribute to the learning process by examining local transactions and quickly identifying new patterns.

Read the article
Solving Big Problems with Oracle R Enterprise, Part I

- by dbayard

Abstract: This blog post will show how we used Oracle R Enterprise to tackle a customer’s big calculation problem across a big data set. Overview: Databases are great for managing large amounts of data in a central place with rigorous enterprise-level controls. R is great for doing advanced computations. Sometimes you need to do advanced computations on large amounts of data, subject to rigorous enterprise-level concerns. This blog post shows how Oracle R Enterprise enables R plus the Oracle Database enabled us to do some pretty sophisticated calculations across 1 million accounts (each with many detailed records) in minutes. The problem: A financial services customer of mine has a need to calculate the historical internal rate of return (IRR) for its customers’ portfolios. This information is needed for customer statements and the online web application. In the past, they had solved this with a home-grown application that pulled trade and account data out of their data warehouse and ran the calculations. But this home-grown application was not able to do this fast enough, plus it was a challenge for them to write and maintain the code that did the IRR calculation. IRR – a problem that R is good at solving: Internal Rate of Return is an interesting calculation in that in most real-world scenarios it is impractical to calculate exactly. Rather, IRR is a calculation where approximation techniques need to be used. In this blog post, we will discuss calculating the “money weighted rate of return” but in the actual customer proof of concept we used R to calculate both money weighted rate of returns and time weighted rate of returns. You can learn more about the money weighted rate of returns here: http://www.wikinvest.com/wiki/Money-weighted_return First Steps- Calculating IRR in R We will start with calculating the IRR in standalone/desktop R. In our second post, we will show how to take this desktop R function, deploy it to an Oracle Database, and make it work at real-world scale. The first step we did was to get some sample data. For a historical IRR calculation, you have a balances and cash flows. In our case, the customer provided us with several accounts worth of sample data in Microsoft Excel. The above figure shows part of the spreadsheet of sample data. The data provides balances and cash flows for a sample account (BMV=beginning market value. FLOW=cash flow in/out of account. EMV=ending market value). Once we had the sample spreadsheet, the next step we did was to read the Excel data into R. This is something that R does well. R offers multiple ways to work with spreadsheet data. For instance, one could save the spreadsheet as a .csv file. In our case, the customer provided a spreadsheet file containing multiple sheets where each sheet provided data for a different sample account. To handle this easily, we took advantage of the RODBC package which allowed us to read the Excel data sheet-by-sheet without having to create individual .csv files. We wrote ourselves a little helper function called getsheet() around the RODBC package. Then we loaded all of the sample accounts into a data.frame called SimpleMWRRData. Writing the IRR function At this point, it was time to write the money weighted rate of return (MWRR) function itself. The definition of MWRR is easily found on the internet or if you are old school you can look in an investment performance text book. In the customer proof, we based our calculations off the ones defined in the The Handbook of Investment Performance: A User’s Guide by David Spaulding since this is the reference book used by the customer. (One of the nice things we found during the course of this proof-of-concept is that by using R to write our IRR functions we could easily incorporate the specific variations and business rules of the customer into the calculation.) The key thing with calculating IRR is the need to solve a complex equation with a numerical approximation technique. For IRR, you need to find the value of the rate of return (r) that sets the Net Present Value of all the flows in and out of the account to zero. With R, we solve this by defining our NPV function: where bmv is the beginning market value, cf is a vector of cash flows, t is a vector of time (relative to the beginning), emv is the ending market value, and tend is the ending time. Since solving for r is a one-dimensional optimization problem, we decided to take advantage of R’s optimize method (http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optimize.html). The optimize method can be used to find a minimum or maximum; to find the value of r where our npv function is closest to zero, we wrapped our npv function inside the abs function and asked optimize to find the minimum. Here is an example of using optimize: where low and high are scalars that indicate the range to search for an answer. To test this out, we need to set values for bmv, cf, t, emv, tend, low, and high. We will set low and high to some reasonable defaults. For example, this account had a negative 2.2% money weighted rate of return. Enhancing and Packaging the IRR function With numerical approximation methods like optimize, sometimes you will not be able to find an answer with your initial set of inputs. To account for this, our approach was to first try to find an answer for r within a narrow range, then if we did not find an answer, try calling optimize() again with a broader range. See the R help page on optimize() for more details about the search range and its algorithm. At this point, we can now write a simplified version of our MWRR function. (Our real-world version is more sophisticated in that it calculates rate of returns for 5 different time periods [since inception, last quarter, year-to-date, last year, year before last year] in a single invocation. In our actual customer proof, we also defined time-weighted rate of return calculations. The beauty of R is that it was very easy to add these enhancements and additional calculations to our IRR package.)To simplify code deployment, we then created a new package of our IRR functions and sample data. For this blog post, we only need to include our SimpleMWRR function and our SimpleMWRRData sample data. We created the shell of the package by calling: To turn this package skeleton into something usable, at a minimum you need to edit the SimpleMWRR.Rd and SimpleMWRRData.Rd files in the \man subdirectory. In those files, you need to at least provide a value for the “title” section. Once that is done, you can change directory to the IRR directory and type at the command-line: The myIRR package for this blog post (which has both SimpleMWRR source and SimpleMWRRData sample data) is downloadable from here: myIRR package Testing the myIRR package Here is an example of testing our IRR function once it was converted to an installable package: Calculating IRR for All the Accounts So far, we have shown how to calculate IRR for a single account. The real-world issue is how do you calculate IRR for all of the accounts?This is the kind of situation where we can leverage the “Split-Apply-Combine” approach (see http://www.cscs.umich.edu/~crshalizi/weblog/815.html). Given that our sample data can fit in memory, one easy approach is to use R’s “by” function. (Other approaches to Split-Apply-Combine such as plyr can also be used. See http://4dpiecharts.com/2011/12/16/a-quick-primer-on-split-apply-combine-problems/). Here is an example showing the use of “by” to calculate the money weighted rate of return for each account in our sample data set. Recap and Next Steps At this point, you’ve seen the power of R being used to calculate IRR. There were several good things: R could easily work with the spreadsheets of sample data we were given R’s optimize() function provided a nice way to solve for IRR- it was both fast and allowed us to avoid having to code our own iterative approximation algorithm R was a convenient language to express the customer-specific variations, business-rules, and exceptions that often occur in real-world calculations- these could be easily added to our IRR functions The Split-Apply-Combine technique can be used to perform calculations of IRR for multiple accounts at once. However, there are several challenges yet to be conquered at this point in our story: The actual data that needs to be used lives in a database, not in a spreadsheet The actual data is much, much bigger- too big to fit into the normal R memory space and too big to want to move across the network The overall process needs to run fast- much faster than a single processor The actual data needs to be kept secured- another reason to not want to move it from the database and across the network And the process of calculating the IRR needs to be integrated together with other database ETL activities, so that IRR’s can be calculated as part of the data warehouse refresh processes In our next blog post in this series, we will show you how Oracle R Enterprise solved these challenges.

Read the article
Adventures in Lab Management Configuration: CMMI Edition Part 1 of 3

- by Enrique Lima

I remember at one point someone telling me how close Migrate was to Migraine. This was a process that included an environment from TFS 2008 to TFS 2010, needed to be migrated too as far as the process template goes. Here we are talking about CMMI v4.2 to CMMI v5.0. Now, the process to migrate the TFS Infrastructure is one thing, migrating the Process Template is a different deal, not hard … just involved. Followed a combination of steps that came from a blog post as the main guidance and then MSDN (as suggested on the guidance post) to complement some tasks and steps. Again, the focus I have here is CMMI. The high level steps taken to enable the TFS 2008 CMMI v4.2 migrated to TFS 2010 Process Template are: 1) Backup the Collection, Configuration and Warehouse Databases. 2) Downloaded the Process Template using Visual Studio 2010. 3) Exported, modified and imported Bug Type Definition 4) Exported, modified and imported Scenario or Requirement Type Definition. 5) Created and imported bug field mappings. Now, we can attempt to connect using Test Manager, and you should be able to get this going. After that was done, it was time to enroll VMs that already existed in the environment. This was a bit more challenging, but in the end it was a matter of just analyzing the changes that had been made to had a temporary work around from the time we migrated to the time we converted the Work Items and such and added fields to enable communication between the project and the Test and Lab Manager component. There are 2 more parts to this post, the second will describe the detailed steps taken to complete the Process Template update and the third will talk about the gotchas and fixes for the Lab Management portion.

Read the article
LightView: JavaFX 2 real-time visualizer for GlassFish

- by arungupta

Adam Bien launched LightFish, a light-weight monitoring and visualization application for GlassFish. It comes with a introduction and a screencast to get you started. The tool provides monitoring information about threads and memory (such as heap size, thread count, peak thread count), transactions (commits and rollbacks), HTTP sessions, JDBC sessions, and even "paranormal activity". In a recently released first part of a tri-part article series at OTN, Adam explains how REST services can be exposed as bindable set of properties for JavaFX. The article titled "Enterprise side of JavaFX" shows how a practical combination of REST and JavaFX together. It explains how read-only and dynamic properties can be created. The fine-grained binding model allows clear separation of the view, presentation, and business logic. Read the first part here.

Read the article

< Previous Page | 30 31 32 33 34 35 36 37 38 39 40 41 | Next Page >