Search Results

Search found 4703 results on 189 pages for 'emit knowledge'.

Page 77/189 | < Previous Page | 73 74 75 76 77 78 79 80 81 82 83 84 | Next Page >

Trying to make a plugin system in C++

- by Pirate for Profit

I'm making a task-based program that needs to have plugins. Tasks need to have properties which can be easily edited, I think this can be done with Qt's Meta-Object Compiler reflection capabilities (I could be wrong, but I should be able to stick this in a QtPropertyBrowser?) So here's the base: class Task : public QObject { Q_OBJECT public: explicit Task(QObject *parent = 0) : QObject(parent){} virtual void run() = 0; signals: void taskFinished(bool success = true); } Then a plugin might have this task: class PrinterTask : public Task { Q_OBJECT public: explicit PrinterTask(QObject *parent = 0) : Task(parent) {} void run() { Printer::getInstance()->Print(this->getData()); // fictional emit taskFinished(true); } inline const QString &getData() const; inline void setData(QString data); Q_PROPERTY(QString data READ getData WRITE setData) // for reflection } In a nutshell, here's what I want to do: // load plugin // find all the Tasks interface implementations in it // have user able to choose a Task and edit its specific Q_PROPERTY's // run the TASK It's important that one .dll has multiple tasks, because I want them to be associated by their module. For instance, "FileTasks.dll" could have tasks for deleting files, making files, etc. The only problem with Qt's plugin setup is I want to store X amount of Tasks in one .dll module. As far as I can tell, you can only load one interface per plugin (I could be wrong?). If so, the only possible way to do accomplish what I want is to create a FactoryInterface with string based keys which return the objects (as in Qt's Plug-And-Paint example), which is a terrible boilerplate that I would like to avoid. Anyone know a cleaner C++ plugin architecture than Qt's to do what I want? Also, am I safely assuming Qt's reflection capabilities will do what I want (i.e. able to edit an unknown dynamically loaded tasks' properties with the QtPropertyBrowser before dispatching)?

Read the article
PyQt - QLabel inheriting

- by Ockonal

Hello, i wanna inherit QLabel to add there click event processing. I'm trying this code: class NewLabel(QtGui.QLabel): def __init__(self, parent): QtGui.QLabel.__init__(self, parent) def clickEvent(self, event): print 'Label clicked!' But after clicking I have no line 'Label clicked!' EDIT: Okay, now I'm using not 'clickEvent' but 'mousePressEvent'. And I still have a question. How can i know what exactly label was clicked? For example, i have 2 edit box and 2 labels. Labels content are pixmaps. So there aren't any text in labels, so i can't discern difference between labels. How can i do that? EDIT2: I made this code: class NewLabel(QtGui.QLabel): def __init__(self, firstLabel): QtGui.QLabel.__init__(self, firstLabel) def mousePressEvent(self, event): print 'Clicked' #myLabel = self.sender() # None =) self.emit(QtCore.SIGNAL('clicked()'), "Label pressed") In another class: self.FirstLang = NewLabel(Form) QtCore.QObject.connect(self.FirstLang, QtCore.SIGNAL('clicked()'), self.labelPressed) Slot in the same class: def labelPressed(self): print 'in labelPressed' print self.sender() But there isn't sender object in self. What i did wrong?

Read the article
problem with two key ranges in couchdb

- by Duasto

I'm having problem getting the right results in my coordinate system. To explain my system, I have this simple database that have x_axis, y_axis and name columns. I don't need to get all the data, I just need to display some part of it. For example, I have a coordinate system that have 10:10(meaning from x_axis -10 to 10 and from y_axis -10 to 10) and I want to display only 49 coordinates. In sql query I can do it something like this: "select * from coordinate where x_axis = -3 and x_axis <= 3 and y_axis = -3 y_axis <= 3" I tried this function but no success: "by_range": { "map": "function(doc) { emit([doc.x_axis, doc.y_axis], doc) }" } by_range?startkey=[-3,-3]&endkey=[3,3] I got a wrong results of: -3x-3 -3x-2 -3x-1 -3x0 -3x1 -3x2 -3x3 <-- should not display this part -- -3x4 -3x5 -3x6 -3x7 -3x8 -3x9 -3x10 <-- end of should not display this part -- ..... up to 3x3 to give you a better understanding of my project here is the screenshot of that I want to be made: Oops they don't allowed new poster to post an image img96(dot)imageshack(dot)us/img96/5382/coordinates(dot)jpg <<< just change the "(dot)" to "."

Read the article
What's the correct terminology for something that isn't quite classification nor regression?

- by TC

Let's say that I have a problem that is basicly classification. That is, given some input and a number of possible output classes, find the correct class for the given input. Neural networks and decision trees are some of the algorithms that may be used to solve such problems. These algorithms typically only emit a single result however: the resulting classification. Now what if I weren't only interested in one classification, but in the posterior probabilities that the input belongs to each of the classes. I.E., instead of the answer "This input belongs in class A", I want the answer "This input belongs to class A with 80%, class B with 15% and class C with 5%". My question is not on how to obtain these posterior probabilities, but rather on the correct terminology to describe the process of finding them. You could call it regression, since we are now trying to estimate a number of real valued numbers, but I am not quite sure if that's right. I feel it's not exactly classification either, it's something in between the two. Is there a word that describes the process of finding the class conditional posterior probabilities that some input belongs in each of the possible output classes? P.S. I'm not exactly sure if this question is enough of a programming question, but since it's about machine learning and machine learning generally involves a decent amount of programming, let's give it a shot.

Read the article
NHibernate: What are child sessions and why and when should I use them?

- by stefando

In the comments for the ayende's blog about the auditing in NHibernate there is a mention about the need to use a child session:session.GetSession(EntityMode.Poco). As far as I understand it, it has something to do with the order of the SQL operation which session.Flush will emit. (For example: If I wanted to perform some delete operation in the pre-insert event but the session was already done with deleting operations, I would need some way to inject them in.) However I did not find documentation about this feature and behavior. Questions: Is my understanding of child sessions correct? How and in which scenarios should I use them? Are they documented somewhere? Could they be used for session "scoping"? (For example: I open the master session which will hold some data and then I create 2 child-sessions from the master one. I'd expect that the two child-scopes will be separated but the will share objects from the master session cache. Is this the case?) Are they first class citizens in NHibernate or are they just hack to support some edge-case scenarios? Thanks in advance for any info.

Read the article
servicestack Razor view with request and response DTO

- by user7398

I'm having a go with the razor functionality in service stack. I have a razor cshtml view working for one of my response DTO's. I need to access some values from the request DTO in the razor view that have been filled in from some fields from the REST route, so i can construct a url to put into the response html page and also label some form labels. Is there anyway of doing this? I don't want to duplicate the property from the request DTO into the response DTO just for this html view. Because i'm trying to emulate an existing REST service of another product, i do not want to emit extra data just for the html view. eg http://localhost/rest/{Name}/details/{Id} eg @inherits ViewPage<DetailsResponse> @{ ViewBag.Title = "todo title"; Layout = "HtmlReport"; } this needs to come from the request dto NOT @Model <a href="/rest/@Model.Name">link to user</a> <a href="/rest/@Model.Name/details/@Model.Id">link to user details</a>

Read the article
How to get number of bytes read from QTextStream

- by user261882

The following code I am using to find the number of read bytes from QFile. With some files it gives the correct file size, but with some files it gives me a value that is approximatively fileCSV.size()/2. I am sending two files that have same number of characters in it, but have different file sizes link text. Should i use some other objects for reading the QFile? QFile fileCSV("someFile.txt"); if ( !fileCSV.open(QIODevice::ReadOnly | QIODevice::Text)) emit errorOccurredReadingCSV(this); QTextStream textStreamCSV( &fileCSV ); // use a text stream int fileCSVSize = fileCSV.size()); qint64 reconstructedCSVFileSize = 0; while ( !textStreamCSV.atEnd() ) { QString line = textStreamCSV.readLine(); // line of text excluding '\n' if (!line.isEmpty()) { reconstructedCSVFileSize += line.size(); //this doesn't work always reconstructedCSVFileSize += 2; } else reconstructedCSVFileSize += 2; } I know that reading the size of QString is wrong, give me some other solutions if you can. Thank you.

Read the article
Is there a faster way to access a property member of a class using reflection?

- by Ross Goddard

I am currently using the following code to access the property of an object using reflection: Dim propInfo As Reflection.PropertyInfo = myType.GetProperty(propName) Dim objValue As Object = propInfo.GetValue(myObject, Nothing) I am having some issues with the speed since this type of code is being called many times and is causing some slowdown. I have been looking into using Refelction.Emit or dynamic methods, but I am not sure exactly how to make use of them. Background Information: I am creating a list of a subset of the properties of the object, associating then with some meta information (such as if they can be loaded from the database or xml, if they are editable, can the user see them). This is for later consumption so we can write code such as : foreach prop as BaseWrapper in graphNode.NodeProperties prop.LoadFromDataRow(dr) next The application makes heavy use of having access to this list. The problem is that on the initial load of a project, a larger number of objects are being created that make use of this, so for each object created it is looping through this code a number of times. I initially tried adding each property to the list manually, but this ran into problems with not everything being initialized at the correct time and some other issues. If there is no other good way, then I may have to rethink some of the design and see what else can be done to improve the performance.

Read the article
Sorting CouchDB Views By Value

- by Lee Theobald

Hi all, I'm testing out CouchDB to see how it could handle logging some search results. What I'd like to do is produce a view where I can produce the top queries from the results. At the moment I have something like this: Example document portion { "query": "+dangerous +dogs", "hits": "123" } Map function (Not exactly what I need/want but it's good enough for testing) function(doc) { if (doc.query) { var split = doc.query.split(" "); for (var i in split) { emit(split[i], 1); } } } Reduce Function function (key, values, rereduce) { return sum(values); } Now this will get me results in a format where a query term is the key and the count for that term on the right, which is great. But I'd like it ordered by the value, not the key. From the sounds of it, this is not yet possible with CouchDB. So does anyone have any ideas of how I can get a view where I have an ordered version of the query terms & their related counts? I'm very new to CouchDB and I just can't think of how I'd write the functions needed.

Read the article
Strategies for testing reactive, asynchronous code

- by Arne

I am developing a data-flow oriented domain-specific language. To simplify, let's just look at Operations. Operations have a number of named parameters and can be asked to compute their result using their current state. To decide when an Operation should produce a result, it gets a Decision that is sensitive to which parameter got a value from who. When this Decision decides that it is fulfilled, it emits a Signal using an Observer. An Accessor listens for this Signal and in turn calls the Result method of the Operation in order to multiplex it to the parameters of other Operations. So far, so good, nicely decoupled design, composable and reusable and, depending on the specific Observer used, as asynchronous as you want it to be. Now here's my problem: I would love to start coding actual Tests against this design. But with an asynchronous Observer... how should I know that the whole signal-and-parameters-plumbing worked? Do I need to use time outs while waiting for a Signal in order to say that it was emitted successfully or not? How can I be, formally, sure that the Signal will not be emitted if I just wait a little longer (halting problem? ;-)) And, how can I be sure that the Signal was emitted because it was me who set a parameter, and not another Operation? It might well be that my test comes to early and sees a Signal that was emitted way before my setting a parameter caused a Decision to emit it. Currently, I guess the trivial cases are easy to test, but as soon as I want to test complex many-to-many - situations between operations I must resort to hoping that the design Just Works (tm)...

Read the article
Value types of variable size

- by YellPika

I'm trying to code a small math library in C#. I wanted to create a generic vector structure where the user could define the element type (int, long, float, double, etc.) and dimensions. My first attempt was something like this... public struct Vector<T> { public readonly int Dimensions; public readonly T[] Elements; // etc... } Unfortunately, Elements, being an array, is also a reference type. Thus, doing this, Vector<int> a = ...; Vector<int> b = a; a[0] = 1; b[0] = 2; would result in both a[0] and b[0] equaling 2. My second attempt was to define an interface IVector<T>, and then use Reflection.Emit to automatically generate the appropriate type at runtime. The resulting classes would look roughly like this: public struct Int32Vector3 : IVector<T> { public int Element0; public int Element1; public int Element2; public int Dimensions { get { return 3; } } // etc... } This seemed fine until I found out that interfaces seem to act like references to the underlying object. If I passed an IVector to a function, and changes to the elements in the function would be reflected in the original vector. What I think is my problem here is that I need to be able to create classes that have a user specified number of fields. I can't use arrays, and I can't use inheritance. Does anyone have a solution? EDIT: This library is going to be used in performance critical situations, so reference types are not an option.

Read the article
CouchDB- basic grouping question

- by dnolen

I have a user document which has a group field. This field is an array of group ids. I would like to write a view that returns (groupid as key) - (array of user docs as val). This mapping operation seems like a good beginning. function(doc) { var type = doc.type; var groups = doc.groups; if(type == "user" && groups.length > 0) { for(var i = 0; i < groups.length; i++) { emit(groups[i], doc); } } } But there's obviously something very wrong with my attempt at a reduce: function(key, values, rereduce) { var set = []; var seen = []; for(var i = 0; i < values.length; i++) { var _id = values[i]._id; if(seen.indexOf(_id) == -1) { seen.push(_id); set.push(values[i]); } } return set; } I'm running CouchDB 0.10dev. Any help appreciated.

Read the article
QMetaMethods for regular methods missing?

- by oleks

Hi, I'm new in QT, and I'm just testing out the MOC. For a given class: class Counter : public QObject { Q_OBJECT int m_value; public: Counter() {m_value = 0;} ~Counter() {} int value() {return m_value;} public slots: void setValue(int value); signals: void valueChanged(int newValue); }; I want to get a list of all methods in a class, but seem to only be getting a list of signals and slots, although the documentation says it should be all methods? Here's my code: #include <QCoreApplication> #include <QObject> #include <QMetaMethod> #include <iostream> using std::cout; using std::endl; int main(int argc, char *argv[]) { QCoreApplication app(argc, argv); const QMetaObject cntmo = Counter::staticMetaObject; for(int i = 0; i != cntmo.methodCount(); ++i) { QMetaMethod qmm(cntmo.method(i)); cout << qmm.signature() << endl; } return app.exec(); } Please beware this is my best c/p, perhaps I forgot to include some headers. My output: destroyed(QObject*) destroyed() deleteLater() _q_reregisterTimers(void*) valueChanged(int) setValue(int) Does anyone know why this is happening? Does qt not recognise int value() {return m_value;} as a valid method? If so, is there a macro I've forgotten or something like that? P.S. I'm using 4.6.2 UPDATE I forgot the implementation of the setValue method, not that it makes too much a difference to my actual question. void Counter::setValue(int value) { if(value != m_value) { m_value = value; emit valueChanged(value); } }

Read the article
How to open a chat window in sender and receiver side [on hold]

- by DEEPS

When i am trying to send a message from sender the chat window is always opening in senders side instead of receiver side.so please give a correct code to display chat box in both side. (HTML 5, JAVASCRIPT,JQUERY). This is client side code: //Send private message function sendPvtMsg(data) { var pvtmsg = data; socket.emit('message',JSON.stringify({msg: 'pvtMsg', data: { from: userName, to: toChat, pvtmsg: data }}),roomId); } socket.on('message',function(data) { var command = JSON.parse(data); var itemName = command.msg; var rec_data = command.data.message; var sender = command.data.name; //Receive message from server if (itemName == "message") { document.getElementById("chat").value += sender + " : " + rec_data + "\n"; } //Receive private message else if (itemName == "pvtMsg") { var to = command.data.to; var from = command.data.from; //To display message to sender and receiver if (userName == to || userName == from) { var pvtmsg = command.data.pvtmsg; document.getElementById("chat").value += from + "( to " + to + ")" + " : " + pvtmsg + "\n"; } } function createChatBox(chatboxtitle,minimizeChatBox) { if ($("#chatbox_"+chatboxtitle).length > 0) { if ($("#chatbox_"+chatboxtitle).css('display') == 'none') { $("#chatbox_"+chatboxtitle).css('display','block'); restructureChatBoxes(); } $("#chatbox_"+chatboxtitle+" .chatboxtextarea").focus(); return; }

Read the article
Embedding ADF UI Components into OAF regions

- by Juan Camilo Ruiz

Having finished the 2 Webcast on ADF integration with Oracle E-Business Suite, Sara Woodhull, Principal Product Manager on the Oracle E-Business Suite Applications Technology team and I are going to continue adding entries to the series on this topic, trying to cover as many use cases as possible. In this entry, Sara created an overview on how Oracle ADF pages can be embedded into an Oracle Application Framework region. This is a very interesting approach that will enable those of you who are exploring ADF as a technology stack to enhanced some of the Oracle E-Business Suite flows and leverage your skill on Oracle Applications Framework (OAF). In upcoming entries we will start unveiling the internals needed to achieve session sharing between the regions. Stay tuned for more entries and enjoy this new post. Document Scope This document only covers information that is specific to embedding an Oracle ADF page in an Oracle Application Framework–based page. It assumes knowledge of Oracle ADF and Oracle Application Framework development. It also assumes knowledge of the material in My Oracle Support Note 974949.1, “Oracle E-Business Suite SDK for Java” and My Oracle Support Note 1296491.1, "FAQ for Integration of Oracle E-Business Suite and Oracle Application Development Framework (ADF) Applications". Prerequisite Patch Download Patch 12726556:R12.FND.B from My Oracle Support and install it. The implementation described below requires Patch 12726556:R12.FND.B to provide the accessors for the ADF page. This patch is required in addition to the Oracle E-Business Suite SDK for Java patch described in My Oracle Support Note 974949.1. Development Environments You need two different JDeveloper environments: Oracle ADF and OA Framework. Oracle ADF Development Environment You build your Oracle ADF page using JDeveloper 11g. You should use JDeveloper 11g R1 (the latest is 11.1.1.6.0) if you need to use other products in the Oracle Fusion Middleware Stack, such as Oracle WebCenter, Oracle SOA Suite, or BI. You should use JDeveloper 11g R2 (the latest is 11.1.2.3.0) if you do not need other Oracle Fusion Middleware products. JDeveloper 11g R2 is an Oracle ADF-specific release that supports the latest Java EE standards and has various core improvements. Oracle Application Framework Development Environment Build your OA Framework page using a development environment corresponding to your Oracle E-Business Suite version. You must use Release 12.1.2 or later because the rich content container was introduced in Release 12.1.2. See “OA Framework - How to find the correct version of JDeveloper to use with eBusiness Suite 11i or Release 12.x” (My Oracle Support Doc ID 416708.1). Building your Oracle ADF Page Typically you build your ADF page using the session management feature of the Oracle E-Business Suite SDK for Java as described in My Oracle Support Note 974949.1. Also see My Oracle Support Note 1296491.1, "FAQ for Integration of Oracle E-Business Suite and Oracle Application Development Framework (ADF) Applications". Building an ADF Page with the Hierarchy Viewer If you are using the ADF hierarchy viewer, you should set up the structure and settings of the ADF page as follows or the hierarchy viewer may not fill the entire area it is supposed to fill (especially a problem in Firefox). Create a stretchable component as the parent component for the hierarchy viewer, such as af:panelStretchLayout (underneath the af:form component in the structure). Use af:panelStretchLayout for Oracle ADF 11.1.1.6 and earlier. For later versions of Oracle ADF, use af:panelGridLayout. Create your hierarchy viewer component inside the stretchable component. Create Function in Oracle E-Business Suite Instance In your Oracle E-Business Suite instance, create a function for your ADF page with the following parameters. You can use either the Functions window in the System Administrator responsibility or the Functions page in the Functional Administrator responsibility. Function Function Name Type=External ADF Function (ADFX) HTML Call=GWY.jsp?targetPage=faces/<your ADF page> ">You must also add your function to an Oracle E-Business Suite menu or permission set and set up function security or role-based access control (RBAC) so that the user has authorization to access the function. If you do not want the function to appear on the navigation menu, add the function without a menu prompt. See the Oracle E-Business Suite System Administrator's Guide Documentation Set for more information. Testing the Function from the Oracle E-Business Suite Home Page It’s a good idea to test launching your ADF page from the Oracle E-Business Suite Home Page. Add your function to the navigation menu for your responsibility with a prompt and try launching it. If your ADF page expects parameters from the surrounding page, those might not be available, however. Setting up the Oracle Application Framework Rich Container Once you have built your Oracle ADF 11g page, you need to embed it in your Oracle Application Framework page. Create Rich Content Container in your OA Framework JDeveloper environment In the OA Extension Structure pane for your OAF page, select the region where you want to add the rich content, and add a richContainer item to the region. Set the following properties on the richContainer item: id Content Type=Others (for Release 12.1.3. This property value may change in a future release.) Destination Function=[function code] Width (in pixels or percent, such as 100%) Height (in pixels) Parameters=[any parameters your Oracle ADF page is expecting to receive from the Oracle Application Framework page] Parameters In the Parameters property, specify parameters that will be passed to the embedded content as a list of comma-separated, name-value pairs. Dynamic parameters may be specified as paramName={@viewAttr}. Dynamic Rich Content Container Properties If you want your rich content container to display a different Oracle ADF page depending on other information, you would set up a different function for each different Oracle ADF page. You would then set the Destination Function and Parameters properties programmatically, instead of setting them in the Property Inspector. In the processRequest() method of your Oracle Application Framework page controller, where OAFRichContentPage is the ID of your richContainer item and the parameters are whatever parameters your ADF page expects, your code might look similar to this code fragment: OARichContainerBean richBean = (OARichContainerBean) webBean.findChildRecursive("OAFRichContentPage"); if(richBean != null){ if(isFirstCondition){ richBean.setFunctionName("ADF_EXAMPLE_EMBEDDED"); richBean.setParameters("ParamLoginPersonId="+loginPersonId +"&ParamPersonId="+personId+"&ParamUserId="+userId +"&ParamRespId="+respId+"&ParamRespApplId="+respApplId +"&ParamFromOA=Y"+"&ParamSecurityGroupId="+securityGroupId); } else if(isSecondCondition){ richBean.setFunctionName("ADF_EXAMPLE_OTHER_FUNCTION"); richBean.setParameters("ParamLoginPersonId=" +loginPersonId+"&ParamPersonId="+personId +"&ParamUserId="+userId+"&ParamRespId="+respId +"&ParamRespApplId="+respApplId +"&ParamFromOA=Y" +"&ParamSecurityGroupId="+securityGroupId); } }

Read the article
Parallelism in .NET – Part 5, Partitioning of Work

- by Reed

When parallelizing any routine, we start by decomposing the problem. Once the problem is understood, we need to break our work into separate tasks, so each task can be run on a different processing element. This process is called partitioning. Partitioning our tasks is a challenging feat. There are opposing forces at work here: too many partitions adds overhead, too few partitions leaves processors idle. Trying to work the perfect balance between the two extremes is the goal for which we should aim. Luckily, the Task Parallel Library automatically handles much of this process. However, there are situations where the default partitioning may not be appropriate, and knowledge of our routines may allow us to guide the framework to making better decisions. First off, I’d like to say that this is a more advanced topic. It is perfectly acceptable to use the parallel constructs in the framework without considering the partitioning taking place. The default behavior in the Task Parallel Library is very well-behaved, even for unusual work loads, and should rarely be adjusted. I have found few situations where the default partitioning behavior in the TPL is not as good or better than my own hand-written partitioning routines, and recommend using the defaults unless there is a strong, measured, and profiled reason to avoid using them. However, understanding partitioning, and how the TPL partitions your data, helps in understanding the proper usage of the TPL. I indirectly mentioned partitioning while discussing aggregation. Typically, our systems will have a limited number of Processing Elements (PE), which is the terminology used for hardware capable of processing a stream of instructions. For example, in a standard Intel i7 system, there are four processor cores, each of which has two potential hardware threads due to Hyperthreading. This gives us a total of 8 PEs – theoretically, we can have up to eight operations occurring concurrently within our system. In order to fully exploit this power, we need to partition our work into Tasks. A task is a simple set of instructions that can be run on a PE. Ideally, we want to have at least one task per PE in the system, since fewer tasks means that some of our processing power will be sitting idle. A naive implementation would be to just take our data, and partition it with one element in our collection being treated as one task. When we loop through our collection in parallel, using this approach, we’d just process one item at a time, then reuse that thread to process the next, etc. There’s a flaw in this approach, however. It will tend to be slower than necessary, often slower than processing the data serially. The problem is that there is overhead associated with each task. When we take a simple foreach loop body and implement it using the TPL, we add overhead. First, we change the body from a simple statement to a delegate, which must be invoked. In order to invoke the delegate on a separate thread, the delegate gets added to the ThreadPool’s current work queue, and the ThreadPool must pull this off the queue, assign it to a free thread, then execute it. If our collection had one million elements, the overhead of trying to spawn one million tasks would destroy our performance. The answer, here, is to partition our collection into groups, and have each group of elements treated as a single task. By adding a partitioning step, we can break our total work into small enough tasks to keep our processors busy, but large enough tasks to avoid overburdening the ThreadPool. There are two clear, opposing goals here: Always try to keep each processor working, but also try to keep the individual partitions as large as possible. When using Parallel.For, the partitioning is always handled automatically. At first, partitioning here seems simple. A naive implementation would merely split the total element count up by the number of PEs in the system, and assign a chunk of data to each processor. Many hand-written partitioning schemes work in this exactly manner. This perfectly balanced, static partitioning scheme works very well if the amount of work is constant for each element. However, this is rarely the case. Often, the length of time required to process an element grows as we progress through the collection, especially if we’re doing numerical computations. In this case, the first PEs will finish early, and sit idle waiting on the last chunks to finish. Sometimes, work can decrease as we progress, since previous computations may be used to speed up later computations. In this situation, the first chunks will be working far longer than the last chunks. In order to balance the workload, many implementations create many small chunks, and reuse threads. This adds overhead, but does provide better load balancing, which in turn improves performance. The Task Parallel Library handles this more elaborately. Chunks are determined at runtime, and start small. They grow slowly over time, getting larger and larger. This tends to lead to a near optimum load balancing, even in odd cases such as increasing or decreasing workloads. Parallel.ForEach is a bit more complicated, however. When working with a generic IEnumerable<T>, the number of items required for processing is not known in advance, and must be discovered at runtime. In addition, since we don’t have direct access to each element, the scheduler must enumerate the collection to process it. Since IEnumerable<T> is not thread safe, it must lock on elements as it enumerates, create temporary collections for each chunk to process, and schedule this out. By default, it uses a partitioning method similar to the one described above. We can see this directly by looking at the Visual Partitioning sample shipped by the Task Parallel Library team, and available as part of the Samples for Parallel Programming. When we run the sample, with four cores and the default, Load Balancing partitioning scheme, we see this: The colored bands represent each processing core. You can see that, when we started (at the top), we begin with very small bands of color. As the routine progresses through the Parallel.ForEach, the chunks get larger and larger (seen by larger and larger stripes). Most of the time, this is fantastic behavior, and most likely will out perform any custom written partitioning. However, if your routine is not scaling well, it may be due to a failure in the default partitioning to handle your specific case. With prior knowledge about your work, it may be possible to partition data more meaningfully than the default Partitioner. There is the option to use an overload of Parallel.ForEach which takes a Partitioner<T> instance. The Partitioner<T> class is an abstract class which allows for both static and dynamic partitioning. By overriding Partitioner<T>.SupportsDynamicPartitions, you can specify whether a dynamic approach is available. If not, your custom Partitioner<T> subclass would override GetPartitions(int), which returns a list of IEnumerator<T> instances. These are then used by the Parallel class to split work up amongst processors. When dynamic partitioning is available, GetDynamicPartitions() is used, which returns an IEnumerable<T> for each partition. If you do decide to implement your own Partitioner<T>, keep in mind the goals and tradeoffs of different partitioning strategies, and design appropriately. The Samples for Parallel Programming project includes a ChunkPartitioner class in the ParallelExtensionsExtras project. This provides example code for implementing your own, custom allocation strategies, including a static allocator of a given chunk size. Although implementing your own Partitioner<T> is possible, as I mentioned above, this is rarely required or useful in practice. The default behavior of the TPL is very good, often better than any hand written partitioning strategy.

Read the article
Issue 15: The Benefits of Oracle Exastack

- by rituchhibber

SOLUTIONS FOCUS The Benefits of Oracle Exastack Paul ThompsonDirector, Alliances and Solutions Partner ProgramsOracle EMEA Alliances & Channels RESOURCES -- Oracle PartnerNetwork (OPN) Oracle Exastack Program Oracle Exastack Ready Oracle Exastack Optimized Oracle Exastack Labs and Enablement Resources Oracle Exastack Labs Video Tour SUBSCRIBE FEEDBACK PREVIOUS ISSUES Exastack is a revolutionary programme supporting Oracle independent software vendor partners across the entire Oracle technology stack. Oracle's core strategy is to engineer software and hardware together, and our ISV strategy is the same. At Oracle we design engineered systems that are pre-integrated to reduce the cost and complexity of IT infrastructures while increasing productivity and performance. Oracle innovates and optimises performance at every layer of the stack to simplify business operations, drive down costs and accelerate business innovation. Our engineered systems are optimised to achieve enterprise performance levels that are unmatched in the industry. Faster time to production is achieved by implementing pre-engineered and pre-assembled hardware and software bundles. Our strategy of delivering a single-vendor stack simplifies and reduces costs associated with purchasing, deploying, and supporting IT environments for our customers and partners. In parallel to this core engineered systems strategy, the Oracle Exastack Program enables our Oracle ISV partners to leverage a scalable, integrated infrastructure that delivers their applications tuned, tested and optimised for high-performance. Specifically, the Oracle Exastack Program helps ISVs run their solutions on the Oracle Exadata Database Machine, Oracle Exalogic Elastic Cloud, and Oracle SPARC SuperCluster T4-4 - integrated systems products in which the software and hardware are engineered to work together. These products provide OPN members with a lower cost and high performance infrastructure for database and application workloads across on-premise and cloud based environments. Ready and Optimized Oracle Partners can now leverage our new Oracle Exastack Program to become Oracle Exastack Ready and Oracle Exastack Optimized. Partners can achieve Oracle Exastack Ready status through their support for Oracle Solaris, Oracle Linux, Oracle VM, Oracle Database, Oracle WebLogic Server, Oracle Exadata Database Machine, Oracle Exalogic Elastic Cloud, and Oracle SPARC SuperCluster T4-4. By doing this, partners can demonstrate to their customers that their applications are available on the latest major releases of these products. The Oracle Exastack Ready programme helps customers readily differentiate Oracle partners from lesser software developers, and identify applications that support Oracle engineered systems. Achieving Oracle Exastack Optimized status demonstrates that an OPN member has proven itself against goals for performance and scalability on Oracle integrated systems. This status enables end customers to readily identify Oracle partners that have tested and tuned their solutions for optimum performance on an Oracle Exadata Database Machine, Oracle Exalogic Elastic Cloud, and Oracle SPARC SuperCluster T4-4. These ISVs can display the Oracle Exadata Optimized, Oracle Exalogic Optimized or Oracle SPARC SuperCluster Optimized logos on websites and on all their collateral to show that they have tested and tuned their application for optimum performance. Deliver higher value to customers Oracle's investment in engineered systems enables ISV partners to deliver higher value to customer business processes. New innovations are enabled through extreme performance unachievable through traditional best-of-breed multi-vendor server/software approaches. Core product requirements can be launched faster, enabling ISVs to focus research and development investment on core competencies in order to bring value to market as quickly as possible. Through Exastack, partners no longer have to worry about the underlying product stack, which allows greater focus on the development of intellectual property above the stack. Partners are not burdened by platform issues and can concentrate simply on furthering their applications. The advantage to end customers is that partners can focus all efforts on business functionality, rather than bullet-proofing underlying technologies, and so will inevitably deliver application updates faster. Exastack provides ISVs with a number of flexible deployment options, such as on-premise or Cloud, while maintaining one single code base for applications regardless of customer deployment preference. Customers buying their solutions from Exastack ISVs can therefore be confident in deploying on their own networks, on private clouds or into a public cloud. The underlying platform will support all conceivable deployments, enabling a focus on the ISV's application itself that wouldn't be possible with other vendor partners. It stands to reason that Exastack accelerates time to value as well as lowering implementation costs all round. There is a big competitive advantage in partners being able to offer customers an optimised, pre-configured solution rather than an assortment of components and a suggested fit. Once a customer has decided to buy an Oracle Exastack Ready or Optimized partner solution, it will be up and running without any need for the customer to conduct testing of its own. Operational costs and complexity are also reduced, thanks to streamlined customer support through standardised configurations and pro-active monitoring. 'Engineered to Work Together' is a significant statement of Oracle strategy. It guarantees smoother deployment of a single vendor solution, clear ownership with no finger-pointing and the peace of mind of the Oracle Support Centre underpinning the entire product stack. Next steps Every OPN member with packaged applications must seriously consider taking steps to become Exastack Ready, or Exastack Optimized at the first opportunity. That first step down the track is to talk to an expert on the OPN Portal, at the Oracle Partner Business Center or to discuss the next steps with the closest Oracle account manager. Oracle Exastack lab environments and other technical enablement resources are available for OPN members wishing to further their knowledge of Oracle Exastack and qualify their applications for Oracle Exastack Optimized. New Boot Camps and Guided Learning Paths (GLPs), tailored specifically for ISVs, are available for Oracle Exadata Database Machine, Oracle Exalogic Elastic Cloud, Oracle Linux, Oracle Solaris, Oracle Database, and Oracle WebLogic Server. More information about these GLPs and Boot Camps (including delivery dates and locations) are posted on the OPN Competency Center and corresponding OPN Knowledge Zones. Learn more about Oracle Exastack labs and ISV specific enablement resources. "Oracle Specialized partners are of course front-and-centre, with potential customers clearly directed to those partners and to Exadata Ready partners as a matter of priority." --More OpenWorld 2011 highlights for Oracle partners and customers Oracle Application Testing Suite 9.3 application testing solution for Web, SOA and Oracle Applications Oracle Application Express Release 4.1 improving the development of database-centric Web 2.0 applications and reports Oracle Unified Directory 11g helping customers manage the critical identity information that drives their business applications Oracle SOA Suite for healthcare integration Oracle Enterprise Pack for Eclipse 11g demonstrating continued commitment to the developer and open source communities Oracle Coherence 3.7.1, the latest release of the industry's leading distributed in-memory data grid Oracle Process Accelerators helping to simplify and accelerate time-to-value for customers' business process management initiatives Oracle's JD Edwards EnterpriseOne on the iPad meeting the increasingly mobile demands of today's workforces Oracle CRM On Demand Release 19 Innovation Pack introducing industry-leading hosted call centre and enterprise-marketing capabilities designed to drive further revenue and productivity while reducing costs and improving the customer experience Oracle's Primavera Portfolio Management 9 for businesses delivering on project portfolio goals with increased versatility, transparency and accuracy Oracle's PeopleSoft Human Capital Management (HCM) 9.1 On Demand Standard Edition helping customers manage their long-term investment in enterprise-wide business applications New versions of Oracle FLEXCUBE Universal Banking and Oracle FLEXCUBE Investor Servicing for Financial Institutions, as well as Oracle Financial Services Enterprise Case Management, Oracle Financial Services Pricing Management, Oracle Financial Management Analytics and Oracle Tax Analytics Oracle Utilities Network Management System 1.11 offering new modelling and analysis features to improve distribution-grid management for electric utilities Oracle Communications Network Charging and Control 4.4 helping communications service providers (CSPs) offer their customers more flexible charging options Plus many, many more technology announcements, enhancements, momentum news and community updates -- Oracle OpenWorld 2012 A date has already been set for Oracle OpenWorld 2012. Held once again in San Francisco, exhibitors, partners, customers and Oracle people will gather from 30 September until 4 November to meet, network and learn together with the rest of the global Oracle community. Register now for Oracle OpenWorld 2012 and save $$$! We'll reward your early planning for Oracle OpenWorld 2012 with reduced rates. Super Saver deals are now available! -- Back to the welcome page

Read the article
Bye Bye Year of the Dragon, Hello BPM

- by Ajay Khanna

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} As 2012 fades and we usher in a New Year, let’s look back at some of the hottest BPM trends and those we’ll be seeing more of in the coming months. BPM is as much about people as it is about technology. As people adopt new ways of engagement, new channels of communications and new devices to interact , the changes are reflected in BPM practices. As Social and Mobile have become an integral part of our personal and professional lives, we’ll see tighter integration of social and mobile with BPM, and more use cases emerging for smarter process management in 2013. And with products and services becoming less differentiated, organizations will strive to differentiate on Customer Experience. Concepts like Pace Layered Architecture and Dynamic Case Management will provide more flexibility and agility to IT groups and knowledge workers. Take a look at some of these capabilities we showcased (see video) at Oracle OpenWorld 2012. Some of these trends that will continue to gain momentum in 2013: Social networks and social media have provided a new way for businesses to engage with customers. A prospect is likely to reach out to their social network before making any purchase. Companies are increasingly engaging with customers in social networks to influence their purchasing decisions, as well as listening to customers via tools like sentiment analysis to see what customers think about a particular product or process. These insights are valuable as companies look to improve their processes. Inside organizations, workers are using social tools to engage with each other to design new products and processes. Social collaboration tools are being used to resolve issues where an employee needs consultation to reach a decision. Oracle BPM Suite includes social interaction as an integral part of its process design and work management to empower today’s business users. Ubiquitous smart mobile devices are trending as a tool of choice for many workers. Many companies are adopting the policy of “Bring Your Own Device,” and the device of choice is a tablet. Devices like smart phones and tablets not only provide mobility to workers and customers, but they also provide additional important information – the context. By integrating the mobile context (location, photos, and preferences) into your processes, organizations can make much more informed decisions, as well as offer more personalized service to customers. Using Oracle ADF Mobile, you can easily create user interfaces for mobile devices and also capture location data for process execution. Customer experience was at the forefront of trending topics in 2012. Organizations are trying to understand their customers better and offer them more personalized and differentiated services. Customer experience is paramount when companies design sales and support processes. Companies are looking to BPM to consistently and efficiently orchestrate customer facing processes across disparate systems, departments and channels of communication. Oracle BPM Suite provides just the right capabilities for organizations to design and deliver an excellent customer experience. Pace Layered Architecture strategy is gaining traction as a way to maximize agility and minimize disruption in organizations. It provides a framework to manage the evolution of your information system when different pieces of it are changing at different rates and need to be updated independent of one another. Oracle Fusion Middleware and Oracle BPM Suite are designed with this in mind. The database layer, integration layer, application layer, and process layer should not be required to change at the same time. Most of the business changes to policy or process can be done at the process layer without disrupting the whole infrastructure. By understanding the type of change needed at a particular level, organizations can become much more agile and efficient. Adaptive Case Management proposes more flexibility to manage processes or cases that do not follow a structured process flow. In such situations, the knowledge worker managing the case needs to evaluate what step should occur next because the sequence of steps can’t be predetermined. Another characteristic is that it requires much more collaboration than straight-through process. As simple processes become automated, and customers adopt more and more self-service, cases that reach the case workers are much more complex and need more investigation. Oracle BPM suite includes comprehensive adaptive case management capability to manage such unstructured and complex processes. Smart BPM or making your BPM intelligent has been the holy grail for BPM practitioners who imagined that one day BPM would become one with Business Intelligence, Business Activity Monitoring and Complex Event Processing, making it much more responsive and helpful in organizational decision making. In 2013, organizations will begin to deploy these intelligent BPM solutions. Oracle offers an integrated solution that brings together the powerful functionality of BI, BAM, event processing, and Real Time Decisions to help organizations create smart process based solutions. In order to help customers reach their BPM goals faster and remove risks associated with BPM initiatives, Oracle has introduced Oracle Process Accelerators, pre-built best practices applications built on Oracle BPM Suite that are fully production grade and ready to deploy. These are exiting times for BPM practitioners and there is so much to look forward to in 2013. We wish you a very happy and prosperous New Year 2013. Happy BPMing!

Read the article
ODI 12c - Parallel Table Load

- by David Allan

In this post we will look at the ODI 12c capability of parallel table load from the aspect of the mapping developer and the knowledge module developer - two quite different viewpoints. This is about parallel table loading which isn't to be confused with loading multiple targets per se. It supports the ability for ODI mappings to be executed concurrently especially if there is an overlap of the datastores that they access, so any temporary resources created may be uniquely constructed by ODI. Temporary objects can be anything basically - common examples are staging tables, indexes, views, directories - anything in the ETL to help the data integration flow do its job. In ODI 11g users found a few workarounds (such as changing the technology prefixes - see here) to build unique temporary names but it was more of a challenge in error cases. ODI 12c mappings by default operate exactly as they did in ODI 11g with respect to these temporary names (this is also true for upgraded interfaces and scenarios) but can be configured to support the uniqueness capabilities. We will look at this feature from two aspects; that of a mapping developer and that of a developer (of procedures or KMs). 1. Firstly as a Mapping Developer..... 1.1 Control when uniqueness is enabled A new property is available to set unique name generation on/off. When unique names have been enabled for a mapping, all temporary names used by the collection and integration objects will be generated using unique names. This property is presented as a check-box in the Property Inspector for a deployment specification. 1.2 Handle cleanup after successful execution Provided that all temporary objects that are created have a corresponding drop statement then all of the temporary objects should be removed during a successful execution. This should be the case with the KMs developed by Oracle. 1.3 Handle cleanup after unsuccessful execution If an execution failed in ODI 11g then temporary tables would have been left around and cleaned up in the subsequent run. In ODI 12c, KM tasks can now have a cleanup-type task which is executed even after a failure in the main tasks. These cleanup tasks will be executed even on failure if the property 'Remove Temporary Objects on Error' is set. If the agent was to crash and not be able to execute this task, then there is an ODI tool (OdiRemoveTemporaryObjects here) you can invoke to cleanup the tables - it supports date ranges and the like. That's all there is to it from the aspect of the mapping developer it's much, much simpler and straightforward. You can now execute the same mapping concurrently or execute many mappings using the same resource concurrently without worrying about conflict. 2. Secondly as a Procedure or KM Developer..... In the ODI Operator the executed code shows the actual name that is generated - you can also see the runtime code prior to execution (introduced in 11.1.1.7), for example below in the code type I selected 'Pre-executed Code' this lets you see the code about to be processed and you can also see the executed code (which is the default view). References to the collection (C$) and integration (I$) names will be automatically made unique by using the odiRef APIs - these objects will have unique names whenever concurrency has been enabled for a particular mapping deployment specification. It's also possible to use name uniqueness functions in procedures and your own KMs. 2.1 New uniqueness tags You can also make your own temporary objects have unique names by explicitly including either %UNIQUE_STEP_TAG or %UNIQUE_SESSION_TAG in the name passed to calls to the odiRef APIs. Such names would always include the unique tag regardless of the concurrency setting. To illustrate, let's look at the getObjectName() method. At <% expansion time, this API will append %UNIQUE_STEP_TAG to the object name for collection and integration tables. The name parameter passed to this API may contain %UNIQUE_STEP_TAG or %UNIQUE_SESSION_TAG. This API always generates to the <? version of getObjectName() At execution time this API will replace the unique tag macros with a string that is unique to the current execution scope. The returned name will conform to the name-length restriction for the target technology, and its pattern for the unique tag. Any necessary truncation will be performed against the initial name for the object and any other fixed text that may have been specified. Examples are:- <?=odiRef.getObjectName("L", "%COL_PRFEMP%UNIQUE_STEP_TAG", "D")?> SCOTT.C$_EABH7QI1BR1EQI3M76PG9SIMBQQ <?=odiRef.getObjectName("L", "EMP%UNIQUE_STEP_TAG_AE", "D")?> SCOTT.EMPAO96Q2JEKO0FTHQP77TMSAIOSR_ Methods which have this kind of support include getFrom, getTableName, getTable, getObjectShortName and getTemporaryIndex. There are APIs for retrieving this tag info also, the getInfo API has been extended with the following properties (the UNIQUE* properties can also be used in ODI procedures); UNIQUE_STEP_TAG - Returns the unique value for the current step scope, e.g. 5rvmd8hOIy7OU2o1FhsF61 Note that this will be a different value for each loop-iteration when the step is in a loop. UNIQUE_SESSION_TAG - Returns the unique value for the current session scope, e.g. 6N38vXLrgjwUwT5MseHHY9 IS_CONCURRENT - Returns info about the current mapping, will return 0 or 1 (only in % phase) GUID_SRC_SET - Returns the UUID for the current source set/execution unit (only in % phase) The getPop API has been extended with the IS_CONCURRENT property which returns info about an mapping, will return 0 or 1. 2.2 Additional APIs Some new APIs are provided including getFormattedName which will allow KM developers to construct a name from fixed-text or ODI symbols that can be optionally truncate to a max length and use a specific encoding for the unique tag. It has syntax getFormattedName(String pName[, String pTechnologyCode]) This API is available at both the % and the ? phase. The format string can contain the ODI prefixes that are available for getObjectName(), e.g. %INT_PRF, %COL_PRF, %ERR_PRF, %IDX_PRF alongwith %UNIQUE_STEP_TAG or %UNIQUE_SESSION_TAG. The latter tags will be expanded into a unique string according to the specified technology. Calls to this API within the same execution context are guaranteed to return the same unique name provided that the same parameters are passed to the call. e.g. <%=odiRef.getFormattedName("%COL_PRFMY_TABLE%UNIQUE_STEP_TAG_AE", "ORACLE")%> <?=odiRef.getFormattedName("%COL_PRFMY_TABLE%UNIQUE_STEP_TAG_AE", "ORACLE")?> C$_MY_TAB7wDiBe80vBog1auacS1xB_AE <?=odiRef.getFormattedName("%COL_PRFMY_TABLE%UNIQUE_STEP_TAG.log", "FILE")?> C2_MY_TAB7wDiBe80vBog1auacS1xB.log 2.3 Name length generation As part of name generation, the length of the generated name will be compared with the maximum length for the target technology and truncation may need to be applied. When a unique tag is included in the generated string it is important that uniqueness is not compromised by truncation of the unique tag. When a unique tag is NOT part of the generated name, the name will be truncated by removing characters from the end - this is the existing 11g algorithm. When a unique tag is included, the algorithm will first truncate the <postfix> and if necessary the <prefix>. It is recommended that users will ensure there is sufficient uniqueness in the <prefix> section to ensure uniqueness of the final resultant name. SUMMARY To summarize, ODI 12c make it much simpler to utilize mappings in concurrent cases and provides APIs for helping developing any procedures or custom knowledge modules in such a way they can be used in highly concurrent, parallel scenarios.

Read the article
Developer’s Life – Attitude and Communication – They Can Cause Problems – Notes from the Field #027

- by Pinal Dave

[Note from Pinal]: This is a 27th episode of Notes from the Field series. The biggest challenge for anyone is to understand human nature. We human have so many things on our mind at any moment of time. There are cases when what we say is not what we mean and there are cases where what we mean we do not say. We do say and things as per our mood and our agenda in mind. Sometimes there are incidents when our attitude creates confusion in the communication and we end up creating a situation which is absolutely not warranted. In this episode of the Notes from the Field series database expert Mike Walsh explains a very crucial issue we face in our career, which is not technical but more to relate to human nature. Read on this may be the best blog post you might read in recent times. In this week’s note from the field, I’m taking a slight departure from technical knowledge and concepts explained. We’ll be back to it next week, I’m sure. Pinal wanted us to explain some of the issues we bump into and how we see some of our customers arrive at problem situations and how we have helped get them back on the right track. Often it is a technical problem we are officially solving – but in a lot of cases as a consultant, we are really helping fix some communication difficulties. This is a technical blog post and not an “advice column” in a newspaper – but the longer I am a consultant, the more years I add to my experience in technology the more I learn that the vast majority of the problems we encounter have “soft skills” included in the chain of causes for the issue we are helping overcome. This is not going to be exhaustive but I hope that sharing four pieces of advice inspired by real issues starts a process of searching for places where we can be the cause of these challenges and look at fixing them in ourselves. Or perhaps we can begin looking at resolving them in teams that we manage. I’ll share three statements that I’ve either heard, read or said and talk about some of the communication or attitude challenges highlighted by the statement. 1 – “But that’s the SAN Administrator’s responsibility…” I heard that early on in my consulting career when talking with a customer who had serious corruption and no good recent backups – potentially no good backups at all. The statement doesn’t have to be this one exactly, but the attitude here is an attitude of “my job stops here, and I don’t care about the intent or principle of why I’m here.” It’s also a situation of having the attitude that as long as there is someone else to blame, I’m fine… You see in this case, the DBA had a suspicion that the backups were not being handled right. They were the DBA and they knew that they had responsibility to ensure SQL backups were good to go – it’s a basic requirement of a production DBA. In my “As A DBA Where Do I start?!” presentation, I argue that is job #1 of a DBA. But in this case, the thought was that there was someone else to blame. Rather than create extra work and take on responsibility it was decided to just let it be another team’s responsibility. This failed the company, the company’s customers and no one won. As technologists – we should strive to go the extra mile. If there is a lack of clarity around roles and responsibilities and we know it – we should push to get it resolved. Especially as the DBAs who should act as the advocates of the data contained in the databases we are responsible for. 2 – “We’ve always done it this way, it’s never caused a problem before!” Complacency. I have to say that many failures I’ve been paid good money to help recover from would have not happened had it been for an attitude of complacency. If any thoughts like this have entered your mind about your situation you may be suffering from it. If, while reading this, you get this sinking feeling in your stomach about that one thing you know should be fixed but haven’t done it.. Why don’t you stop and go fix it then come back.. “We should have better backups, but we’re on a SAN so we should be fine really.” “Technically speaking that could happen, but what are the chances?” “We’ll just clean that up as a fast follow” ..and so on. In the age of tightening IT budgets, increased expectations of up time, availability and performance there is no room for complacency. Our customers and business units expect – no demand – the best. Complacency says “we will give you second best or hopefully good enough and we accept the risk and know this may hurt us later. Sometimes an organization will opt for “good enough” and I agree with the concept that at times the perfect can be the enemy of the good. But when we make those decisions in a vacuum and are not reporting them up and discussing them as an organization that is different. That is us unilaterally choosing to do something less than the best and purposefully playing a game of chance. 3 – “This device must accept interference from other devices but not create any” I’ve paraphrased this one – but it’s something the Federal Communications Commission – a federal agency in the United States that regulates electronic communication – requires of all manufacturers of any device that could cause or receive interference electronically. I blogged in depth about this here (http://www.straightpathsql.com/archives/2011/07/relationship-advice-from-the-fcc/) so I won’t go into much detail other than to say this… If we all operated more on the premise that we should do our best to not be the cause of conflict, and to be less easily offended and less upset when we perceive offense life would be easier in many areas! This doesn’t always cause the issues we are called in to help out. Not directly. But where we see it is in unhealthy relationships between the various technology teams at a client. We’ll see teams hoarding knowledge, not sharing well with others and almost working against other teams instead of working with them. If you trace these problems back far enough it often stems from someone or some group of people violating this principle from the FCC. To Sum It Up Technology problems are easy to solve. At Linchpin People we help many customers get past the toughest technological challenge – and at the end of the day it is really just a repeatable process of pattern based troubleshooting, logical thinking and starting at the beginning and carefully stepping through to the end. It’s easy at the end of the day. The tough part of what we do as consultants is the people skills. Being able to help get teams working together, being able to help teams take responsibility, to improve team to team communication? That is the difficult part, and we get to use the soft skills on every engagement. Work on professional development (http://professionaldevelopment.sqlpass.org/) and see continuing improvement here, not just with technology. I can teach just about anyone how to be an excellent DBA and performance tuner, but some of these soft skills are much more difficult to teach. If you want to get started with performance analytics and triage of virtualized SQL Servers with the help of experts, read more over at Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
How To - Guide to Importing Data from a MySQL Database to Excel using MySQL for Excel

- by Javier Treviño

Fetching data from a database to then get it into an Excel spreadsheet to do analysis, reporting, transforming, sharing, etc. is a very common task among users. There are several ways to extract data from a MySQL database to then import it to Excel; for example you can use the MySQL Connector/ODBC to configure an ODBC connection to a MySQL database, then in Excel use the Data Connection Wizard to select the database and table from which you want to extract data from, then specify what worksheet you want to put the data into. Another way is to somehow dump a comma delimited text file with the data from a MySQL table (using the MySQL Command Line Client, MySQL Workbench, etc.) to then in Excel open the file using the Text Import Wizard to attempt to correctly split the data in columns. These methods are fine, but involve some degree of technical knowledge to make the magic happen and involve repeating several steps each time data needs to be imported from a MySQL table to an Excel spreadsheet. So, can this be done in an easier and faster way? With MySQL for Excel you can. MySQL for Excel features an Import MySQL Data action where you can import data from a MySQL Table, View or Stored Procedure literally with a few clicks within Excel. Following is a quick guide describing how to import data using MySQL for Excel. This guide assumes you already have a working MySQL Server instance, Microsoft Office Excel 2007 or 2010 and MySQL for Excel installed. 1. Opening MySQL for Excel Being an Excel Add-In, MySQL for Excel is opened from within Excel, so to use it open Excel, go to the Data tab located in the Ribbon and click MySQL for Excel at the far right of the Ribbon. 2. Creating a MySQL Connection (may be optional) If you have MySQL Workbench installed you will automatically see the same connections that you can see in MySQL Workbench, so you can use any of those and there may be no need to create a new connection. If you want to create a new connection (which normally you will do only once), in the Welcome Panel click New Connection, which opens the Setup New Connection dialog. Here you only need to give your new connection a distinctive Connection Name, specify the Hostname (or IP address) where the MySQL Server instance is running on (if different than localhost), the Port to connect to and the Username for the login. If you wish to test if your setup is good to go, click Test Connection and an information dialog will pop-up stating if the connection is successful or errors were found. 3.Opening a connection to a MySQL Server To open a pre-configured connection to a MySQL Server you just need to double-click it, so the Connection Password dialog is displayed where you enter the password for the login. 4. Selecting a MySQL Schema After opening a connection to a MySQL Server, the Schema Selection Panel is shown, where you can select the Schema that contains the Tables, Views and Stored Procedures you want to work with. To do so, you just need to either double-click the desired Schema or select it and click Next >. 5. Importing data… All previous steps were really the basic minimum needed to drill-down to the DB Object Selection Panel where you can see the Database Objects (grouped by type: Tables, Views and Procedures in that order) that you want to perform actions against; in the case of this guide, the action of importing data from them. a. From a MySQL Table To import from a Table you just need to select it from the list of Database Objects’ Tables group, after selecting it you will note actions below the list become available; then click Import MySQL Data. The Import Data dialog is displayed; you can see some basic information here like the name of the Excel worksheet the data will be imported to (in the window title), the Table Name, the total Row Count and a 10 row preview of the data meant for the user to see the columns that the table contains and to provide a way to select which columns to import. The Import Data dialog is designed with defaults in place so all data is imported (all rows and all columns) by just clicking Import; this is important to minimize the number of clicks needed to get the job done. After the import is performed you will have the data in the Excel worksheet formatted automatically. If you need to override the defaults in the Import Data dialog to change the columns selected for import or to change the number of imported rows you can easily do so before clicking Import. In the screenshot below the defaults are overridden to import only the first 3 columns and rows 10 – 60 (Limit to 50 Rows and Start with Row 10). If the number of rows to be imported exceeds the maximum number of rows Excel can hold in its worksheet, a warning will be displayed in the dialog, meaning the imported number of rows will be limited by that maximum number (65,535 rows if the worksheet is in Compatibility Mode). In the screenshot below you can see the Table contains 80,559 rows, but only 65,534 rows will be imported since the first row is used for the column names if the Include Column Names as Headers checkbox is checked. b. From a MySQL View Similar to the way of importing from a Table, to import from a View you just need to select it from the list of Database Objects’ Views group, then click Import MySQL Data. The Import Data dialog is displayed; identically to the way everything looks when importing from a table, the dialog displays the View Name, the total Row Count and the data preview grid. Since Views are really a filtered way to display data from Tables, it is actually as if we are extracting data from a Table; so the Import Data dialog is actually identical for those 2 Database Objects. After the import is performed, the data in the Excel spreadsheet looks like the following screenshot. Note that you can override the defaults in the Import Data dialog in the same way described above for importing data from Tables. Also the Compatibility Mode warning will be displayed if data exceeds the maximum number of rows explained before. c. From a MySQL Procedure Too import from a Procedure you just need to select it from the list of Database Objects’ Procedures group (note you can see Procedures here but not Functions since these return a single value, so by design they are filtered out). After the selection is made, click Import MySQL Data. The Import Data dialog is displayed, but this time you can see it looks different to the one used for Tables and Views. Given the nature of Store Procedures, they require first that values are supplied for its Parameters and also Procedures can return multiple Result Sets; so the Import Data dialog shows the Procedure Name and the Procedure Parameters in a grid where their values are input. After you supply the Parameter Values click Call. After calling the Procedure, the Result Sets returned by it are displayed at the bottom of the dialog; output parameters and the return value of the Procedure are appended as the last Result Set of the group. You can see each Result Set is displayed as a tab so you can see a preview of the returned data. You can specify if you want to import the Selected Result Set (default), All Result Sets – Arranged Horizontally or All Result Sets – Arranged Vertically using the Import drop-down list; then click Import. After the import is performed, the data in the Excel spreadsheet looks like the following screenshot. Note in this example all Result Sets were imported and arranged vertically. As you can see using MySQL for Excel importing data from a MySQL database becomes an easy task that requires very little technical knowledge, so it can be done by any type of user. Hope you enjoyed this guide! Remember that your feedback is very important for us, so drop us a message: MySQL on Windows (this) Blog - https://blogs.oracle.com/MySqlOnWindows/ Forum - http://forums.mysql.com/list.php?172 Facebook - http://www.facebook.com/mysql Cheers!

Read the article
How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article
The Data Scientist

- by BuckWoody

A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

Read the article
Five Ways Enterprise 2.0 Can Transform Your Business - Q&A from the Webcast

- by [email protected]

A few weeks ago, Vince Casarez and I presented with KMWorld on the Five Ways Enterprise 2.0 Can Transform Your Business. It was an enjoyable, interactive webcast in which Vince and I discussed the ways Enterprise 2.0 can transform your business and more importantly, highlighted key customer examples of how to do so. If you missed the webcast, you can catch a replay here. We had a lot of audience participation in some of the polls we conducted and in the Q&A session. We weren't able to address all of the questions during the broadcast, so we attempted to answer them here: Q: Which area within your firm focuses on Web 2.0? Meaning, do you find new departments developing just to manage the web 2.0 (Twitter, Facebook, etc.) user experience or are you structuring current departments? A: There are three distinct efforts within Oracle. The first is around delivery of these Web 2.0 services for enterprise deployments. This is the focus of the WebCenter team. The second effort is injecting these Web 2.0 services into use cases that drive the different enterprise applications. This effort is focused on how to manage these external services and bring them into a cohesive flow for marketing programs, customer care, and purchasing. The third effort is how we consume these services internally to enhance Oracle's business delivery. It leverages the technologies and use cases of the first two but also pushes the envelope with regards to future directions of these other two areas. Q: In a business, Web 2.0 is mostly like action logs. How can we leverage the official process practice versus the logs of a recent action? Example: a system configuration modified last night on a call out versus the official practice that everybody would use in the morning.A: The key thing to remember is that most Web 2.0 actions / activity streams today are based on collaboration and communication type actions. At least with public social sites like Facebook and Twitter. What we're delivering as part of the WebCenter Suite are not just these types of activities but also enterprise application activities. These enterprise application activities come from different application modules: purchasing, HR, order entry, sales opportunity, etc. The actions within these systems are normally tied to a business object or process: purchase order/customer, employee or department, customer and supplier, customer and product, respectively. Therefore, the activities or "logs" as you name them are able to be "typed" so that as a viewer, you can filter or decide to see only certain types of information. In your example, you could have a view that only showed you recent "configuration" changes and this could be right next to a view that showed off the items to be watched every morning. Q: It's great to hear about customers using the software but is there any plan for future webinars to show what the products/installs look like? That would be very helpful.A: We don't have a webinar planned to show off the install process. However, we have a viewlet that's posted on Oracle Technology Network. You can see it here:http://www.oracle.com/technetwork/testcontent/wcs-install-098014.htmlAnd we've got excellent documentation that walks you through the steps here:http://download.oracle.com/docs/cd/E14571_01/install.1111/e12001/install.htmAnd there's a whole set of demos and examples of what WebCenter can do at this URL:http://www.oracle.com/technetwork/middleware/webcenter/release11-demos-097468.html Q: How do you anticipate managing metadata across the enterprise to make content findable?A: We need to first make sure we are all talking about the same thing when we use a word like "metadata". Here's why... For a developer, metadata means information that describes key elements of the portal or application and what the portal or application can do. For content systems, metadata means key terms that provide a taxonomy or folksonomy about the information that is being indexed, ordered, and managed. For business intelligence systems, metadata means key terms that provide labels to groups of data that most non-mathematicians need to understand. And for SOA, metadata means labels for parts of the processes that business owners should understand that connect development terminology. There are also additional requirements for metadata to be available to the team building these new solutions as well as requirements to make this metadata available to the running system. These requirements are often separated by "design time" and "run time" respectively. So clearly, a general goal of managing metadata across the enterprise is very challenging. We've invested a huge amount of resources around Oracle Metadata Services (MDS) to be able to provide a more generic system for all of these elements. No other vendor has anything like this technology foundation in their products. This provides a huge benefit to our customers as they will now be able to find content, processes, people, and information from a common set of search interfaces with consistent enterprise wide results. Q: Can you give your definition of terms as to document and content, please?A: Content applies to a broad category of information from Word documents, presentations and reports through attachments to invoices and/or purchase orders. Content is essentially any type of digital asset including images, video, and voice. A document is just one type of content. Q: Do you have special integration tools to realize an interaction between UCM and WebCenter Spaces/Services?A: Yes, we've dedicated a whole team of engineers to exploit the key features of Oracle UCM within WebCenter. While ensuring that WebCenter can connect to other non-Oracle systems, we've made sure that with the combined set of Oracle technology, no other solution can match the combined power and integration. This is part of the Oracle Fusion Middleware strategy which is to provide best in class capabilities for Content and Portals. When combined together, the synergy between the two products enables users to quickly add capabilities when they are needed. For example, simple document sharing is part of the combined product offering, but if legal discovery or archiving is required, Oracle UCM product includes these capabilities that can be quickly added. There's no need to move content around or add another system to support this, it's just a feature that gets turned on within Oracle UCM. Q: All customers have some interaction with their applications and have many older versions, how do you see some of these new Enterprise 2.0 capabilities adding value to existing enterprise application deployments?A: Just as Service Oriented Architectures allowed for connecting the processes of different applications systems to work together, there's a need for a similar approach with regards to these enterprise 2.0 capabilities. Oracle WebCenter is built on a core architecture that allows for SOA of these Enterprise 2.0 services so that one set of scalable services can be used and integrated directly into any type of application. In this way, users can get immediate value out of the Enterprise 2.0 capabilities without having to wait for the next major release or upgrade. These centrally managed WebCenter services expose a set of standard interfaces that make it extremely easy to add them into existing applications no matter what technology the application has been implemented. Q: We've heard about Oracle Next Generation applications called "Fusion Applications", can you tell me how all this works together?A: Oracle WebCenter powers the core collaboration and social computing services found within Fusion Applications. It is the core user experience technology for how all the application screens have been implemented. And the core concept of task flows allows for all the Fusion Applications modules to be adaptable and composable by business users and IT without needing to be a professional developer. Oracle WebCenter is at the heart of the new Fusion Applications. In addition, the same patterns and technologies are now being added to the existing applications including JD Edwards, Siebel, Peoplesoft, and eBusiness Suite. The core technology enables all these customers to have a much smoother upgrade path to Fusion Applications. They get immediate benefits of injecting new user interactions into their existing applications without having to completely move to Fusion Applications. And then when the time comes, their users will already be well versed in how the new capabilities work. Q: Does any of this work with non Oracle software? Other databases? Other application servers? etc.A: We have made sure that Oracle WebCenter delivers the broadest set of development choices so that no matter what technology you developers are using, WebCenter capabilities can be quickly and easily added to the site or application. In addition, we have certified Oracle WebCenter to run against non-Oracle databases like DB2 and SQLServer. We have stated plans for certification against MySQL as well. Later in CY 2011, Oracle will provide certification on non-Oracle application servers such as WebSphere and JBoss. Q: How do we balance User and IT requirements in regards to Enterprise 2.0 technologies?A: Wrong decisions are often made because employee knowledge is not tapped efficiently and opportunities to innovate are often missed because the right people do not work together. Collaboration amongst workers in the right business context is critical for success. While standalone Enterprise 2.0 technologies can improve collaboration for collaboration's sake, using social collaboration tools in the context of business applications and processes will improve business responsiveness and lead companies to a more competitive position. As these systems become more mission critical it is essential that they maintain the highest level of performance and availability while scaling to support larger communities. Q: What are the ways in which Enterprise 2.0 can improve business responsiveness?A: With a wide range of Enterprise 2.0 tools in the marketplace, CIOs need to deploy solutions that will meet the requirements from users as well as address the requirements from IT. Workers want a next-generation user experience that is personalized and aggregates their daily tools and tasks, while IT needs to ensure the solution is secure, scalable, flexible, reliable and easily integrated with existing systems. An open and integrated approach to deploying portals, content management, and collaboration can enhance your business by addressing both the needs of knowledge workers for better information and the IT mandate to conserve resources by simplifying, consolidating and centralizing infrastructure and administration.

Read the article
SQL SERVER – Weekly Series – Memory Lane – #050

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Executing Remote Stored Procedure – Calling Stored Procedure on Linked Server In this example we see two different methods of how to call Stored Procedures remotely. Connection Property of SQL Server Management Studio SSMS A very simple example of the how to build connection properties for SQL Server with the help of SSMS. Sample Example of RANKING Functions – ROW_NUMBER, RANK, DENSE_RANK, NTILE SQL Server has a total of 4 ranking functions. Ranking functions return a ranking value for each row in a partition. All the ranking functions are non-deterministic. T-SQL Script to Add Clustered Primary Key Jr. DBA asked me three times in a day, how to create Clustered Primary Key. I gave him following sample example. That was the last time he asked “How to create Clustered Primary Key to table?” 2008 2008 – TRIM() Function – User Defined Function SQL Server does not have functions which can trim leading or trailing spaces of any string at the same time. SQL does have LTRIM() and RTRIM() which can trim leading and trailing spaces respectively. SQL Server 2008 also does not have TRIM() function. User can easily use LTRIM() and RTRIM() together and simulate TRIM() functionality. http://www.youtube.com/watch?v=1-hhApy6MHM 2009 Earlier I have written two different articles on the subject Remove Bookmark Lookup. This article is as part 3 of original article. Please read the first two articles here before continuing reading this article. Query Optimization – Remove Bookmark Lookup – Remove RID Lookup – Remove Key Lookup Query Optimization – Remove Bookmark Lookup – Remove RID Lookup – Remove Key Lookup – Part 2 Query Optimization – Remove Bookmark Lookup – Remove RID Lookup – Remove Key Lookup – Part 3 Interesting Observation – Query Hint – FORCE ORDER SQL Server never stops to amaze me. As regular readers of this blog already know that besides conducting corporate training, I work on large-scale projects on query optimizations and server tuning projects. In one of the recent projects, I have noticed that a Junior Database Developer used the query hint Force Order; when I asked for details, I found out that the basic concept was not properly understood by him. Queries Waiting for Memory Allocation to Execute In one of the recent projects, I was asked to create a report of queries that are waiting for memory allocation. The reason was that we were doubtful regarding whether the memory was sufficient for the application. The following query can be useful in similar cases. Queries that do not have to wait on a memory grant will not appear in the result set of following query. 2010 Quickest Way to Identify Blocking Query and Resolution – Dirty Solution As the title suggests, this is quite a dirty solution; it’s not as elegant as you expect. However, it works totally fine. Simple Explanation of Data Type Precedence While I was working on creating a question for SQL SERVER – SQL Quiz – The View, The Table and The Clustered Index Confusion, I had actually created yet another question along with this question. However, I felt that the one which is posted on the SQL Quiz is much better than this one because what makes that more challenging question is that it has a multiple answer. Encrypted Stored Procedure and Activity Monitor I recently had received questionable if any stored procedure is encrypted can we see its definition in Activity Monitor.Answer is - No. Let us do a quick test. Let us create following Stored Procedure and then launch the Activity Monitor and check the text. Indexed View always Use Index on Table A single table can have maximum 249 non clustered indexes and 1 clustered index. In SQL Server 2008, a single table can have maximum 999 non clustered indexes and 1 clustered index. It is widely believed that a table can have only 1 clustered index, and this belief is true. I have some questions for all of you. Let us assume that I am creating view from the table itself and then create a clustered index on it. In my view, I am selecting the complete table itself. 2011 Detecting Database Case Sensitive Property using fn_helpcollations() I received a question on how to determine the case sensitivity of the database. The quick answer to this is to identify the collation of the database and check the properties of the collation. I have previously written how one can identify database collation. Once you have figured out the collation of the database, you can put that in the WHERE condition of the following T-SQL and then check the case sensitivity from the description. Server Side Paging in SQL Server CE (Compact Edition) SQL Server Denali is coming up with new T-SQL of Paging. I have written about the same earlier.SQL SERVER – Server Side Paging in SQL Server Denali – A Better Alternative, SQL SERVER – Server Side Paging in SQL Server Denali Performance Comparison, SQL SERVER – Server Side Paging in SQL Server Denali – Part2 What is very interesting is that SQL Server CE 4.0 have the same feature introduced. Here is the quick example of the same. To run the script in the example, you will have to do installWebmatrix 4.0 and download sample database. Once done you can run following script. Why I am Going to Attend PASS Summit Unite 2011 The four-day event will be marked by a lot of learning, sharing, and networking, which will help me increase both my knowledge and contacts. Every year, PASS Summit provides me a golden opportunity to build my network as well as to identify and meet potential customers or employees. 2012 Manage Help Settings – CTRL + ALT + F1 This is very interesting read as my daughter once accidently came across a screen in SQL Server Management Studio. It took me 2-3 minutes to figure out how she has created the same screen. Recover the Accidentally Renamed Table “I accidentally renamed table in my SSMS. I was scrolling very fast and I made mistakes. It was either because I double clicked or clicked on F2 (shortcut key for renaming). However, I have made the mistake and now I have no idea how to fix this. If you have renamed the table, I think you pretty much is out of luck. Here are few things which you can do which can give you an idea about what your table name can be if you are lucky. Identify Numbers of Non Clustered Index on Tables for Entire Database Here is the script which will give you numbers of non clustered indexes on any table in entire database. Identify Most Resource Intensive Queries – SQL in Sixty Seconds #029 – Video Here is the complete complete script which I have used in the SQL in Sixty Seconds Video. Thanks Harsh for important Tip in the comment. http://www.youtube.com/watch?v=3kDHC_Tjrns Advanced Data Quality Services with Melissa Data – Azure Data Market For the purposes of the review, I used a database I had in an Excel spreadsheet with name and address information. Upon a cursory inspection, there are miscellaneous problems with these records; some addresses are missing ZIP codes, others missing a city, and some records are slightly misspelled or have unparsed suites. With DQS, I can easily add a knowledge base to help standardize my values, such as for state abbreviations. But how do I know that my address is correct? Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article

Search Results

Search found 4703 results on 189 pages for 'emit knowledge'.

Page 77/189 | < Previous Page | 73 74 75 76 77 78 79 80 81 82 83 84 | Next Page >

- by Pirate for Profit

- by Ockonal

- by Duasto

- by TC

- by stefando

- by user7398

- by user261882

- by Ross Goddard

- by Lee Theobald

- by Arne

- by YellPika

- by dnolen

- by oleks

- by DEEPS

- by Juan Camilo Ruiz

- by Reed

- by rituchhibber

- by Ajay Khanna

- by David Allan

- by Pinal Dave

- by Javier Treviño

- by rchrd

- by BuckWoody

- by [email protected]

- by Pinal Dave

< Previous Page | 73 74 75 76 77 78 79 80 81 82 83 84 | Next Page >