Search Results

Search found 1343 results on 54 pages for 'challenge'.

Page 52/54 | < Previous Page | 48 49 50 51 52 53 54  | Next Page >

  • OWB 11gR2 - Early Arriving Facts

    - by Dawei Sun
    A common challenge when building ETL components for a data warehouse is how to handle early arriving facts. OWB 11gR2 introduced a new feature to address this for dimensional objects entitled Orphan Management. An orphan record is one that does not have a corresponding existing parent record. Orphan management automates the process of handling source rows that do not meet the requirements necessary to form a valid dimension or cube record. In this article, a simple example will be provided to show you how to use Orphan Management in OWB. We first import a sample MDL file that contains all the objects we need. Then we take some time to examine all the objects. After that, we prepare the source data, deploy the target table and dimension/cube loading map. Finally, we run the loading maps, and check the data in target dimension/cube tables. OK, let’s start… 1. Import MDL file and examine sample project First, download zip file from here, which includes a MDL file and three source data files. Then we open OWB design center, import orphan_management.mdl by using the menu File->Import->Warehouse Builder Metadata. Now we have several objects in BI_DEMO project as below: Mapping LOAD_CHANNELS_OM: The mapping for dimension loading. Mapping LOAD_SALES_OM: The mapping for cube loading. Dimension CHANNELS_OM: The dimension that contains channels data. Cube SALES_OM: The cube that contains sales data. Table CHANNELS_OM: The star implementation table of dimension CHANNELS_OM. Table SALES_OM: The star implementation table of cube SALES_OM. Table SRC_CHANNELS: The source table of channels data, that will be loaded into dimension CHANNELS_OM. Table SRC_ORDERS and SRC_ORDER_ITEMS: The source tables of sales data that will be loaded into cube SALES_OM. Sequence CLASS_OM_DIM_SEQ: The sequence used for loading dimension CHANNELS_OM. Dimension CHANNELS_OM This dimension has a hierarchy with three levels: TOTAL, CLASS and CHANNEL. Each level has three attributes: ID (surrogate key), NAME and SOURCE_ID (business key). It has a standard star implementation. The orphan management policy and the default parent setting are shown in the following screenshots: The orphan management policy options that you can set for loading are: Reject Orphan: The record is not inserted. Default Parent: You can specify a default parent record. This default record is used as the parent record for any record that does not have an existing parent record. If the default parent record does not exist, Warehouse Builder creates the default parent record. You specify the attribute values of the default parent record at the time of defining the dimensional object. If any ancestor of the default parent does not exist, Warehouse Builder also creates this record. No Maintenance: This is the default behavior. Warehouse Builder does not actively detect, reject, or fix orphan records. While removing data from a dimension, you can select one of the following orphan management policies: Reject Removal: Warehouse Builder does not allow you to delete the record if it has existing child records. No Maintenance: This is the default behavior. Warehouse Builder does not actively detect, reject, or fix orphan records. (More details are at http://download.oracle.com/docs/cd/E11882_01/owb.112/e10935/dim_objects.htm#insertedID1) Cube SALES_OM This cube is references to dimension CHANNELS_OM. It has three measures: AMOUNT, QUANTITY and COST. The orphan management policy setting are shown as following screenshot: The orphan management policy options that you can set for loading are: No Maintenance: Warehouse Builder does not actively detect, reject, or fix orphan rows. Default Dimension Record: Warehouse Builder assigns a default dimension record for any row that has an invalid or null dimension key value. Use the Settings button to define the default parent row. Reject Orphan: Warehouse Builder does not insert the row if it does not have an existing dimension record. (More details are at http://download.oracle.com/docs/cd/E11882_01/owb.112/e10935/dim_objects.htm#BABEACDG) Mapping LOAD_CHANNELS_OM This mapping loads source data from table SRC_CHANNELS to dimension CHANNELS_OM. The operator CHANNELS_IN is bound to table SRC_CHANNELS; CHANNELS_OUT is bound to dimension CHANNELS_OM. The TOTALS operator is used for generating a constant value for the top level in the dimension. The CLASS_FILTER operator is used to filter out the “invalid” class name, so then we can see what will happen when those channel records with an “invalid” parent are loading into dimension. Some properties of the dimension operator in this mapping are important to orphan management. See the screenshot below: Create Default Level Records: If YES, then default level records will be created. This property must be set to YES for dimensions and cubes if one of their orphan management policies is “Default Parent” or “Default Dimension Record”. This property is set to NO by default, so the user may need to set this to YES manually. LOAD policy for INVALID keys/ LOAD policy for NULL keys: These two properties have the same meaning as in the dimension editor. The values are set to the same as the dimension value when user drops the dimension into the mapping. The user does not need to modify these properties. Record Error Rows: If YES, error rows will be inserted into error table when loading the dimension. REMOVE Orphan Policy: This property is used when removing data from a dimension. Since the dimension loading type is set to LOAD in this example, this property is disabled. Mapping LOAD_SALES_OM This mapping loads source data from table SRC_ORDERS and SRC_ORDER_ITEMS to cube SALES_OM. This mapping seems a little bit complicated, but operators in the red rectangle are used to filter out and generate the records with “invalid” or “null” dimension keys. Some properties of the cube operator in a mapping are important to orphan management. See the screenshot below: Enable Source Aggregation: Should be checked in this example. If the default dimension record orphan policy is set for the cube operator, then it is recommended that source aggregation also be enabled. Otherwise, the orphan management processing may produce multiple fact rows with the same default dimension references, which will cause an “unstable rowset” execution error in the database, since the dimension refs are used as update match attributes for updating the fact table. LOAD policy for INVALID keys/ LOAD policy for NULL keys: These two properties have the same meaning as in the cube editor. The values are set to the same as in the cube editor when the user drops the cube into the mapping. The user does not need to modify these properties. Record Error Rows: If YES, error rows will be inserted into error table when loading the cube. 2. Deploy objects and mappings We now can deploy the objects. First, make sure location SALES_WH_LOCAL has been correctly configured. Then open Control Center Manager by using the menu Tools->Control Center Manager. Expand BI_DEMO->SALES_WH_LOCAL, click SALES_WH node on the project tree. We can see the following objects: Deploy all the objects in the following order: Sequence CLASS_OM_DIM_SEQ Table CHANNELS_OM, SALES_OM, SRC_CHANNELS, SRC_ORDERS, SRC_ORDER_ITEMS Dimension CHANNELS_OM Cube SALES_OM Mapping LOAD_CHANNELS_OM, LOAD_SALES_OM Note that we deployed source tables as well. Normally, we import source table from database instead of deploying them to target schema. However, in this example, we designed the source tables in OWB and deployed them to database for the purpose of this demonstration. 3. Prepare and examine source data Before running the mappings, we need to populate and examine the source data first. Run SRC_CHANNELS.sql, SRC_ORDERS.sql and SRC_ORDER_ITEMS.sql as target user. Then we check the data in these three tables. Table SRC_CHANNELS SQL> select rownum, id, class, name from src_channels; Records 1~5 are correct; they should be loaded into dimension without error. Records 6,7 and 8 have null parents; they should be loaded into dimension with a default parent value, and should be inserted into error table at the same time. Records 9, 10 and 11 have “invalid” parents; they should be rejected by dimension, and inserted into error table. Table SRC_ORDERS and SRC_ORDER_ITEMS SQL> select rownum, a.id, a.channel, b.amount, b.quantity, b.cost from src_orders a, src_order_items b where a.id = b.order_id; Record 178 has null dimension reference; it should be loaded into cube with a default dimension reference, and should be inserted into error table at the same time. Record 179 has “invalid” dimension reference; it should be rejected by cube, and inserted into error table. Other records should be aggregated and loaded into cube correctly. 4. Run the mappings and examine the target data In the Control Center Manager, expand BI_DEMO-> SALES_WH_LOCAL-> SALES_WH-> Mappings, right click on LOAD_CHANNELS_OM node, click Start. Use the same way to run mapping LOAD_SALES_OM. When they successfully finished, we can check the data in target tables. Table CHANNELS_OM SQL> select rownum, total_id, total_name, total_source_id, class_id,class_name, class_source_id, channel_id, channel_name,channel_source_id from channels_om order by abs(dimension_key); Records 1,2 and 3 are the default dimension records for the three levels. Records 8, 10 and 15 are the loaded records that originally have null parents. We see their parents name (class_name) is set to DEF_CLASS_NAME. Those records whose CHANNEL_NAME are Special_4, Special_5 and Special_6 are not loaded to this table because of the invalid parent. Error Table CHANNELS_OM_ERR SQL> select rownum, class_source_id, channel_id, channel_name,channel_source_id, err$$$_error_reason from channels_om_err order by channel_name; We can see all the record with null parent or invalid parent are inserted into this error table. Error reason is “Default parent used for record” for the first three records, and “No parent found for record” for the last three. Table SALES_OM SQL> select a.*, b.channel_name from sales_om a, channels_om b where a.channels=b.channel_id; We can see the order record with null channel_name has been loaded into target table with a default channel_name. The one with “invalid” channel_name are not loaded. Error Table SALES_OM_ERR SQL> select a.amount, a.cost, a.quantity, a.channels, b.channel_name, a.err$$$_error_reason from sales_om_err a, channels_om b where a.channels=b.channel_id(+); We can see the order records with null or invalid channel_name are inserted into error table. If the dimension reference column is null, the error reason is “Default dimension record used for fact”. If it is invalid, the error reason is “Dimension record not found for fact”. Summary In summary, this article illustrated the Orphan Management feature in OWB 11gR2. Automated orphan management policies improve ETL developer and administrator productivity by addressing an important cause of cube and dimension load failures, without requiring developers to explicitly build logic to handle these orphan rows.

    Read the article

  • SQL SERVER – Repair a SQL Server Database Using a Transaction Log Explorer

    - by Pinal Dave
    In this blog, I’ll show how to use ApexSQL Log, a SQL Server transaction log viewer. You can download it for free, install, and play along. But first, let’s describe some disaster recovery scenarios where it’s useful. About SQL Server disaster recovery Along with database development and administration, you must work on a good recovery plan. Disasters do happen and no one’s immune. What you can do is take all actions needed to be ready for a disaster and go through it with minimal data loss and downtime. Besides creating a recovery plan, it’s necessary to have a list of steps that will be executed when a disaster occurs and to test them before a disaster. This way, you’ll know that the plan is good and viable. Testing can also be used as training for all team members, so they can all understand and execute it when the time comes. It will show how much time is needed to have your servers fully functional again and how much data you can lose in a real-life situation. If these don’t meet recovery-time and recovery-point objectives, the plan needs to be improved. Keep in mind that all major changes in environment configuration, business strategy, and recovery objectives require a new recovery plan testing, as these changes most probably induce a recovery plan changing and tweaking. What is a good SQL Server disaster recovery plan? A good SQL Server disaster recovery strategy starts with planning SQL Server database backups. An efficient strategy is to create a full database backup periodically. Between two successive full database backups, you can create differential database backups. It is essential is to create transaction log backups regularly between full database backups. Keep in mind that transaction log backups can be created only on databases in the full recovery model. In other words, a simple, but efficient backup strategy would be a full database backup every night, a transaction log backup every hour, or every 15 minutes. The frequency depends on how much data you can afford to lose and how busy the database is. Another option, instead of creating a full database backup every night, is to create a full database backup once a week (e.g. on Friday at midnight) and differential database backup every night until next Friday when you will create a full database backup again. Once you create your SQL Server database backup strategy, schedule the backups. You can do that easily using SQL Server maintenance plans. Why are transaction logs important? Transaction log backups contain transactions executed on a SQL Server database. They provide enough information to undo and redo the transactions and roll back or forward the database to a point in time. In SQL Server disaster recovery situations, transaction logs enable to repair a SQL Server database and bring it to the state before the disaster. Be aware that even with regular backups, there will be some data missing. These are the transactions made between the last transaction log backup and the time of the disaster. In some situations, to repair your SQL Server database it’s not necessary to re-create the database from its last backup. The database might still be online and all you need to do is roll back several transactions, such as wrong update, insert, or delete. The restore to a point in time feature is available in SQL Server, but for large databases, it is very time-consuming, as SQL Server first restores a full database backup, and then restores transaction log backups, one after another, up to the recovery point. During that time, the database is unavailable. This is where a SQL Server transaction log viewer can help. For optimal recovery, besides having a database in the full recovery model, it’s important that you haven’t manually truncated the online transaction log. This ensures that all transactions made after the last transaction log backup are still in the online transaction log. All you have to do is read and replay them. How to read a SQL Server transaction log? SQL Server doesn’t provide an option to read transaction logs. There are several SQL Server commands and functions that read the content of a transaction log file (fn_dblog, fn_dump_dblog, and DBCC PAGE), but they are undocumented. They require T-SQL knowledge, return a large number of not easy to read and understand columns, sometimes in binary or hexadecimal format. Another challenge is reading UPDATE statements, as it’s necessary to match it to a value in the MDF file. When you finally read the transactions executed, you have to create a script for it. How to easily repair a SQL database? The easiest solution is to use a transaction log reader that will not only read the transactions in the transaction log files, but also automatically create scripts for the read transactions. In the following example, I will show how to use ApexSQL Log to repair a SQL database after a crash. If a database has crashed and both MDF and LDF files are lost, you have to rely on the full database backup and all subsequent transaction log backups. In another scenario, the MDF file is lost, but the LDF file is available. First, restore the last full database backup on SQL Server using SQL Server Management Studio. I’ll name it Restored_AW2014. Then, start ApexSQL Log It will automatically detect all local servers. If not, click the icon right to the Server drop-down list, or just type in the SQL Server instance name. Select the Windows or SQL Server authentication type and select the Restored_AW2014 database from the database drop-down list. When all options are set, click Next. ApexSQL Log will show the online transaction log file. Now, click Add and add all transaction log backups created after the full database backup I used to restore the database. In case you don’t have transaction log backups, but the LDF file hasn’t been lost during the SQL Server disaster, add it using Add.   To repair a SQL database to a point in time, ApexSQL Log needs to read and replay all the transactions in the transaction log backups (or the LDF file saved after the disaster). That’s why I selected the Whole transaction log option in the Filter setup. ApexSQL Log offers a range of various filters, which are useful when you need to read just specific transactions. You can filter transactions by the time of the transactions, operation type (e.g. to read only data inserts), table name, SQL Server login that made the transaction, etc. In this scenario, to repair a SQL database, I’ll check all filters and make sure that all transactions are included. In the Operations tab, select all schema operations (DDL). If you omit these, only the data changes will be read so if there were any schema changes, such as a new function created, or an existing table modified, they will be ignored and database will not be properly repaired. The data repair for modified tables will fail. In the Tables tab, I’ll make sure all tables are selected. I will uncheck the Show operations on dropped tables option, to reduce the number of transactions. Click Next. ApexSQL Log offers three options. Select Open results in grid, to get a user-friendly presentation of the transactions. As you can see, details are shown for every transaction, including the old and new values for updated columns, which are clearly highlighted. Now, select them all and then create a redo script by clicking the Create redo script icon in the menu.   For a large number of transactions and in a critical situation, when acting fast is a must, I recommend using the Export results to file option. It will save some time, as the transactions will be directly scripted into a redo file, without showing them in the grid first. Select Generate reconstruction (REDO) script , change the output path if you want, and click Finish. After the redo T-SQL script is created, ApexSQL Log shows the redo script summary: The third option will create a command line statement for a batch file that you can use to schedule execution, which is not really applicable when you repair a SQL database, but quite useful in daily auditing scenarios. To repair your SQL database, all you have to do is execute the generated redo script using an integrated developer environment tool such as SQL Server Management Studio or any other, against the restored database. You can find more information about how to read SQL Server transaction logs and repair a SQL database on ApexSQL Solution center. There are solutions for various situations when data needs to be recovered, restored, or transactions rolled back. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • When Your Boss Doesn't Want you to Succeed

    - by Phil Factor
    You're working hard to get an application finished. You are programming long into the evenings sometimes, and eating sandwiches at your desk instead of taking a lunch break. Then one day you glance up at the IT manager, serene in his mysterious round of meetings, and think 'Does he actually care whether this project succeeds or not?'. The question may seem absurd. Of course the project must succeed. The truth, as always, is often far more complex. Your manager may even be doing his best to make sure you don't succeed. Why? There have always been rich pickings for the unscrupulous in IT.  In extreme cases, where administrators struggle with scarcely-comprehended technical issues, huge sums of money can be lost and gained without any perceptible results. In a very few cases can fraud be proven: most of the time, the intricacies of the 'game' are such that one can do little more than harbor suspicion.  Where does over-enthusiastic salesmanship end and fraud begin? The Business of Information Technology provides rich opportunities for White-collar crime. The poor developer has his, or her, hands full with the task of wrestling with the sheer complexity of building an application. He, or she, has no time for following the complexities of the chicanery of the management that is directing affairs.  Most likely, the developers wouldn't even suspect that their company management had ulterior motives. I'll illustrate what I mean with an entirely fictional, hypothetical, example. The Opportunist and the Aged Charities often do good, unexciting work that is funded by the income from a bequest that dates back maybe hundreds of years.  In our example, it isn't exciting work, for it involves the welfare of elderly people who have fallen on hard times.  Volunteers visit, giving a smile and a chat, and check that they are all right, but are able to spend a little money on their discretion to ameliorate any pressing needs for these old folk.  The money is made to work very hard and the charity averts a great deal of suffering and eases the burden on the state. Daisy hears the garden gate creak as Mrs Rainer comes up the path. She looks forward to her twice-weekly visit from the nice lady from the trust. She always asked ‘is everything all right, Love’. Cheeky but nice. She likes her cheery manner. She seems interested in hearing her memories, and talking about her far-away family. She helps her with those chores in the house that she couldn’t manage and once even paid to fill the back-shed with coke, the other year. Nice, Mrs. Rainer is, she thought as she goes to open the door. The trustees are getting on in years themselves, and worry about the long-term future of the charity: is it relevant to modern society? Is it likely to attract a new generation of workers to take it on. They are instantly attracted by the arrival to the board of a smartly dressed University lecturer with the ear of the present Government. Alain 'Stalin' Jones is earnest, persuasive and energetic. The trustees welcome him to the board and quickly forgive his humorless political-correctness. He talks of 'diversity', 'relevance', 'social change', 'equality' and 'communities', but his eye is on that huge bequest. Alain first came to notice as a Trotskyite union official, who insinuated himself into one of the duller Trades Unions and turned it, through his passionate leadership, into a radical, headline-grabbing organization.  Middle age, and the rise of European federal socialism, had brought him quiet prosperity and charcoal suits, an ear in the current government, and a wide influence as a member of various Quangos (government bodies staffed by well-paid unelected courtiers).  He was employed as a 'consultant' by several organizations that relied on government contracts. After gaining the confidence of the trustees, and showing a surprising knowledge of mundane processes and the regulatory framework of charities, Alain launches his plan.  The trust will expand their work by means of a bold IT initiative that will coordinate the interventions of several 'caring agencies', and provide  emergency cover, a special Website so anxious relatives can see how their elderly charges are doing, and a vastly more efficient way of coordinating the work of the volunteer carers. It will also provide a special-purpose site that gives 'social networking' facilities, rather like Facebook, to the few elderly folk on the lists with access to the internet. The trustees perk up. Their own experience of the internet is restricted to the occasional scanning of railway timetables, but they can see that it is 'relevant'. In his next report to the other trustees, Alain proudly announces that all this glamorous and exciting technology can be paid for by a grant from the government. He admits darkly that he has influence. True to his word, the government promises a grant of a size that is an order of magnitude greater than any budget that the trustees had ever handled. There was the understandable proviso that the company that would actually do the IT work would have to be one of the government's preferred suppliers and the work would need to be tendered under EU competition rules. The only company that tenders, a multinational IT company with a long track record of government work, quotes ten million pounds for the work. A trustee questions the figure as it seems enormous for the reasonably trivial internet facilities being built, but the IT Salesmen dazzle them with presentations and three-letter acronyms until they subside into quiescent acceptance. After all, they can’t stay locked in the Twentieth century practices can they? The work is put in hand with a large project team, in a splendid glass building near west London. The trustees see rooms of programmers working diligently at screens, and who talk with enthusiasm of the project. Paul, the project manager, looked through his resource schedule with growing unease. His initial excitement at being given his first major project hadn’t lasted. He’d been allocated a lackluster team of developers whose skills didn’t seem right, and he was allowed only a couple of contractors to make good the deficit. Strangely, the presentation he’d given to his management, where he’d saved time and resources with a OTS solution to a great deal of the development work, and a sound conservative architecture, hadn’t gone down nearly as big as he’d hoped. He almost got the feeling they wanted a more radical and ambitious solution. The project starts slipping its dates. The costs build rapidly. There are certain uncomfortable extra charges that appear, such as the £600-a-day charge by the 'Business Manager' appointed to act as a point of liaison between the charity and the IT Company.  When he appeared, his face permanently split by a 'Mr Sincerity' smile, they'd thought he was provided at the cost of the IT Company. Derek, the DBA, didn’t have to go to the server room quite some much as he did: but It got him away from the poisonous despair of the development group. Wave after wave of events had conspired to delay the project.  Why the management had imposed hideous extra bureaucracy to cover ISO 9000 and 9001:2008 accreditation just as the project was struggling to get back on-schedule was  beyond belief.  Then  the Business manager was coming back with endless changes in scope, sorrowing saying that the Trustees were very insistent, though hopelessly out in touch with the reality of technical challenges. Suddenly, the costs mount to the point of consuming the government grant in its entirety. The project remains tantalizingly just out of reach. Alain Jones gives an emotional rallying speech at the trustees review meeting, urging them not to lose their nerve. Sadly, the trustees dip into the accumulated capital of the trust, the seed-corn of all their revenues, in order to save the IT project. A few months later it is all over. The IT project is never delivered, even though it had seemed so incredibly close.  With the trust's capital all gone, the activities it funded have to be terminated and the trust becomes just a shell. There aren't even the funds to mount a legal challenge against the IT company, even had the trust's solicitor advised such a foolish thing. Alain leaves as suddenly as he had arrived, only to pop up a few months later, bronzed and rested, at another charity. The IT workers who were permanent employees are dispersed to other projects, and the contractors leave to other contracts. Within months the entire project is but a vague memory. One or two developers remain  puzzled that their managers had been so obstructive when they should have welcomed progress toward completion of the project, but they put it down to incompetence and testosterone. Few suspected that they were actively preventing the project from getting finished. The relationships between the IT consultancy, and the government of the day are intricate, and made more complex by the Private Finance initiatives and political patronage.  The losers in this case were the taxpayers, and the beneficiaries of the trust, and, perhaps the soul of the original benefactor of the trust, whose bid to give his name some immortality had been scuppered by smooth-talking white-collar political apparatniks.  Even now, nobody is certain whether a crime was ever committed. The perfect heist, I guess. Where’s the victim? "I hear that Daisy’s cottage is up for sale. She’s had to go into a care home.  She didn’t want to at all, but then there is nobody to keep an eye on her since she had that minor stroke a while back.  A charity used to help out. The ‘social’ don’t have the funding, evidently for community care. Yes, her old cat was put down. There was a good clearout, and now the house is all scrubbed and cleared ready for sale. The skip was full of old photos and letters, memories. No room in her new ‘home’."

    Read the article

  • That Escalated Quickly

    - by Jesse Taber
    Originally posted on: http://geekswithblogs.net/GruffCode/archive/2014/05/17/that-escalated-quickly.aspxI have been working remotely out of my home for over 4 years now. All of my coworkers during that time have also worked remotely. Lots of folks have written about the challenges inherent in facilitating communication on remote teams and strategies for overcoming them. A popular theme around this topic is the notion of “escalating communication”. In this context “escalating” means taking a conversation from one mode of communication to a different, higher fidelity mode of communication. Here are the five modes of communication I use at work in order of increasing fidelity: Email – This is the “lowest fidelity” mode of communication that I use. I usually only check it a few times a day (and I’m trying to check it even less frequently than that) and I only keep items in my inbox if they represent an item I need to take action on that I haven’t tracked anywhere else. Forums / Message boards – Being a developer, I’ve gotten into the habit of having other people look over my code before it becomes part of the product I’m working on. These code reviews often happen in “real time” via screen sharing, but I also always have someone else give all of the changes another look using pull requests. A pull request takes my code and lets someone else see the changes I’ve made side-by-side with the existing code so they can see if I did anything dumb. Pull requests can facilitate a conversation about the code changes in an online-forum like style. Some teams I’ve worked on also liked using tools like Trello or Google Groups to have on-going conversations about a topic or task that was being worked on. Chat & Instant Messaging  - Chat and instant messaging are the real workhorses for communication on the remote teams I’ve been a part of. I know some teams that are co-located that also use it pretty extensively for quick messages that don’t warrant walking across the office to talk with someone but reqire more immediacy than an e-mail. For the purposes of this post I think it’s important to note that the terms “chat” and “instant messaging” might insinuate that the conversation is happening in real time, but that’s not always true. Modern chat and IM applications maintain a searchable history so people can easily see what might have been discussed while they were away from their computers. Voice, Video and Screen sharing – Everyone’s got a camera and microphone on their computers now, and there are an abundance of services that will let you use them to talk to other people who have cameras and microphones on their computers. I’m including screen sharing here as well because, in my experience, these discussions typically involve one or more people showing the other participants something that’s happening on their screen. Obviously, this mode of communication is much higher-fidelity than any of the ones listed above. Scheduled meetings are typically conducted using this mode of communication. In Person – No matter how great communication tools become, there’s no substitute for meeting with someone face-to-face. However, opportunities for this kind of communcation are few and far between when you work on a remote team. When a conversation gets escalated that usually means it moves up one or more positions on this list. A lot of people advocate jumping to #4 sooner than later. Like them, I used to believe that, if it was possible, organizing a call with voice and video was automatically better than any kind of text-based communication could be. Lately, however, I’m becoming less convinced that escalating is always the right move. Working Asynchronously Last year I attended a talk at our local code camp given by Drew Miller. Drew works at GitHub and was talking about how they use GitHub internally. Many of the folks at GitHub work remotely, so communication was one of the main themes in Drew’s talk. During the talk Drew used the phrase, “asynchronous communication” to describe their use of chat and pull request comments. That phrase stuck in my head because I hadn’t heard it before but I think it perfectly describes the way in which remote teams often need to communicate. You don’t always know when your co-workers are at their computers or what hours (if any) they are working that day. In order to work this way you need to assume that the person you’re talking to might not respond right away. You can’t always afford to wait until everyone required is online and available to join a voice call, so you need to use text-based, persistent forms of communication so that people can receive and respond to messages when they are available. Going back to my list from the beginning of this post for a second, I characterize items #1-3 as being “asynchronous” modes of communication while we could call items #4 and #5 “synchronous”. When communication gets escalated it’s almost always moving from an asynchronous mode of communication to a synchronous one. Now, to the point of this post: I’ve become increasingly reluctant to escalate from asynchronous to synchronous communication for two primary reasons: 1 – You can often find a higher fidelity way to convey your message without holding a synchronous conversation 2 - Asynchronous modes of communication are (usually) persistent and searchable. You Don’t Have to Broadcast Live Let’s start with the first reason I’ve listed. A lot of times you feel like you need to escalate to synchronous communication because you’re having difficulty describing something that you’re seeing in words. You want to provide the people you’re conversing with some audio-visual aids to help them understand the point that you’re trying to make and you think that getting on Skype and sharing your screen with them is the best way to do that. Firing up a screen sharing session does work well, but you can usually accomplish the same thing in an asynchronous manner. For example, you could take a screenshot and annotate it with some text and drawings to illustrate what it is you’re seeing. If a screenshot won’t work, taking a short screen recording while your narrate over it and posting the video to your forum or chat system along with a text-based description of what’s in the recording that can be searched for later can be a great way to effectively communicate with your team asynchronously. I Said What?!? Now for the second reason I listed: most asynchronous modes of communication provide a transcript of what was said and what decisions might have been made during the conversation. There have been many occasions where I’ve used the search feature of my team’s chat application to find a conversation that happened several weeks or months ago to remember what was decided. Unfortunately, I think the benefits associated with the persistence of communicating asynchronously often get overlooked when people decide to escalate to a in-person meeting or voice/video call. I’m becoming much more reluctant to suggest a voice or video call if I suspect that it might lead to codifying some kind of design decision because everyone involved is going to hang up the call and immediately forget what was decided. I recognize that you can record and archive these types of interactions, but without being able to search them the recordings aren’t terribly useful. When and How To Escalate I don’t mean to imply that communicating via voice/video or in person is never a good idea. I probably jump on a Skype call with a co-worker at least once a day to quickly hash something out or show them a bit of code that I’m working on. Also, meeting in person periodically is really important for remote teams. There’s no way around the fact that sometimes it’s easier to jump on a call and show someone my screen so they can see what I’m seeing. So when is it right to escalate? I think the simplest way to answer that is when the communication starts to feel painful. Everyone’s tolerance for that pain is different, but I think you need to let it hurt a little bit before jumping to synchronous communication. When you do escalate from asynchronous to synchronous communication, there are a couple of things you can do to maximize the effectiveness of the communication: Takes notes – This is huge and yet I’ve found that a lot of teams don’t do this. If you’re holding a meeting with  > 2 people you should have someone taking notes. Taking notes while participating in a meeting can be difficult but there are a few strategies to deal with this challenge that probably deserve a short post of their own. After the meeting, make sure the notes are posted to a place where all concerned parties (including those that might not have attended the meeting) can review and search them. Persist decisions made ASAP – If any decisions were made during the meeting, persist those decisions to a searchable medium as soon as possible following the conversation. All the teams I’ve worked on used a web-based system for tracking the on-going work and a backlog of work to be done in the future. I always try to make sure that all of the cards/stories/tasks/whatever in these systems always reflect the latest decisions that were made as the work was being planned and executed. If held a quick call with your team lead and decided that it wasn’t worth the effort to build real-time validation into that new UI you were working on, go and codify that decision in the story associated with that work immediately after you hang up. Even better, write it up in the story while you are both still on the phone. That way when the folks from your QA team pick up the story to test a few days later they’ll know why the real-time validation isn’t there without having to invoke yet another conversation about the work. Communicating Well is Hard At this point you might be thinking that communicating asynchronously is more difficult than having a live conversation. You’re right: it is more difficult. In order to communicate effectively this way you need to very carefully think about the message that you’re trying to convey and craft it in a way that’s easy for your audience to understand. This is almost always harder than just talking through a problem in real time with someone; this is why escalating communication is such a popular idea. Why wouldn’t we want to do the thing that’s easier? Easier isn’t always better. If you and your team can get in the habit of communicating effectively in an asynchronous manner you’ll find that, over time, all of your communications get less painful because you don’t need to re-iterate previously made points over and over again. If you communicate right the first time, you often don’t need to rehash old conversations because you can go back and find the decisions that were made laid out in plain language. You’ll also find that you get better at doing things like writing useful comments in your code, creating written documentation about how the feature that you just built works, or persuading your team to do things in a certain way.

    Read the article

  • Investigation: Can different combinations of components effect Dataflow performance?

    - by jamiet
    Introduction The Dataflow task is one of the core components (if not the core component) of SQL Server Integration Services (SSIS) and often the most misunderstood. This is not surprising, its an incredibly complicated beast and we’re abstracted away from that complexity via some boxes that go yellow red or green and that have some lines drawn between them. Example dataflow In this blog post I intend to look under that facade and get into some of the nuts and bolts of the Dataflow Task by investigating how the decisions we make when building our packages can affect performance. I will do this by comparing the performance of three dataflows that all have the same input, all produce the same output, but which all operate slightly differently by way of having different transformation components. I also want to use this blog post to challenge a common held opinion that I see perpetuated over and over again on the SSIS forum. That is, that people assume adding components to a dataflow will be detrimental to overall performance. Its not surprising that people think this –it is intuitive to think that more components means more work- however this is not a view that I share. I have always been of the opinion that there are many factors affecting dataflow duration and the number of components is actually one of the less important ones; having said that I have never proven that assertion and that is one reason for this investigation. I have actually seen evidence that some people think dataflow duration is simply a function of number of rows and number of components. I’ll happily call that one out as a myth even without any investigation!  The Setup I have a 2GB datafile which is a list of 4731904 (~4.7million) customer records with various attributes against them and it contains 2 columns that I am going to use for categorisation: [YearlyIncome] [BirthDate] The data file is a SSIS raw format file which I chose to use because it is the quickest way of getting data into a dataflow and given that I am testing the transformations, not the source or destination adapters, I want to minimise external influences as much as possible. In the test I will split the customers according to month of birth (12 of those) and whether or not their yearly income is above or below 50000 (2 of those); in other words I will be splitting them into 24 discrete categories and in order to do it I shall be using different combinations of SSIS’ Conditional Split and Derived Column transformation components. The 24 datapaths that occur will each input to a rowcount component, again because this is the least resource intensive means of terminating a datapath. The test is being carried out on a Dell XPS Studio laptop with a quad core (8 logical Procs) Intel Core i7 at 1.73GHz and Samsung SSD hard drive. Its running SQL Server 2008 R2 on Windows 7. The Variables Here are the three combinations of components that I am going to test:     One Conditional Split - A single Conditional Split component CSPL Split by Month of Birth and income category that will use expressions on [YearlyIncome] & [BirthDate] to send each row to one of 24 outputs. This next screenshot displays the expression logic in use: Derived Column & Conditional Split - A Derived Column component DER Income Category that adds a new column [IncomeCategory] which will contain one of two possible text values {“LessThan50000”,”GreaterThan50000”} and uses [YearlyIncome] to determine which value each row should get. A Conditional Split component CSPL Split by Month of Birth and Income Category then uses that new column in conjunction with [BirthDate] to determine which of the same 24 outputs to send each row to. Put more simply, I am separating the Conditional Split of #1 into a Derived Column and a Conditional Split. The next screenshots display the expression logic in use: DER Income Category         CSPL Split by Month of Birth and Income Category       Three Conditional Splits - A Conditional Split component that produces two outputs based on [YearlyIncome], one for each Income Category. Each of those outputs will go to a further Conditional Split that splits the input into 12 outputs, one for each month of birth (identical logic in each). In this case then I am separating the single Conditional Split of #1 into three Conditional Split components. The next screenshots display the expression logic in use: CSPL Split by Income Category         CSPL Split by Month of Birth 1& 2       Each of these combinations will provide an input to one of the 24 rowcount components, just the same as before. For illustration here is a screenshot of the dataflow containing three Conditional Split components: As you can these dataflows have a fair bit of work to do and remember that they’re doing that work for 4.7million rows. I will execute each dataflow 10 times and use the average for comparison. I foresee three possible outcomes: The dataflow containing just one Conditional Split (i.e. #1) will be quicker There is no significant difference between any of them One of the two dataflows containing multiple transformation components will be quicker Regardless of which of those outcomes come to pass we will have learnt something and that makes this an interesting test to carry out. Note that I will be executing the dataflows using dtexec.exe rather than hitting F5 within BIDS. The Results and Analysis The table below shows all of the executions, 10 for each dataflow. It also shows the average for each along with a standard deviation. All durations are in seconds. I’m pasting a screenshot because I frankly can’t be bothered with the faffing about needed to make a presentable HTML table. It is plain to see from the average that the dataflow containing three conditional splits is significantly faster, the other two taking 43% and 52% longer respectively. This seems strange though, right? Why does the dataflow containing the most components outperform the other two by such a big margin? The answer is actually quite logical when you put some thought into it and I’ll explain that below. Before progressing, a side note. The standard deviation for the “Three Conditional Splits” dataflow is orders of magnitude smaller – indicating that performance for this dataflow can be predicted with much greater confidence too. The Explanation I refer you to the screenshot above that shows how CSPL Split by Month of Birth and salary category in the first dataflow is setup. Observe that there is a case for each combination of Month Of Date and Income Category – 24 in total. These expressions get evaluated in the order that they appear and hence if we assume that Month of Date and Income Category are uniformly distributed in the dataset we can deduce that the expected number of expression evaluations for each row is 12.5 i.e. 1 (the minimum) + 24 (the maximum) divided by 2 = 12.5. Now take a look at the screenshots for the second dataflow. We are doing one expression evaluation in DER Income Category and we have the same 24 cases in CSPL Split by Month of Birth and Income Category as we had before, only the expression differs slightly. In this case then we have 1 + 12.5 = 13.5 expected evaluations for each row – that would account for the slightly longer average execution time for this dataflow. Now onto the third dataflow, the quick one. CSPL Split by Income Category does a maximum of 2 expression evaluations thus the expected number of evaluations per row is 1.5. CSPL Split by Month of Birth 1 & CSPL Split by Month of Birth 2 both have less work to do than the previous Conditional Split components because they only have 12 cases to test for thus the expected number of expression evaluations is 6.5 There are two of them so total expected number of expression evaluations for this dataflow is 6.5 + 6.5 + 1.5 = 14.5. 14.5 is still more than 12.5 & 13.5 though so why is the third dataflow so much quicker? Simple, the conditional expressions in the first two dataflows have two boolean predicates to evaluate – one for Income Category and one for Month of Birth; the expressions in the Conditional Split in the third dataflow however only have one predicate thus they are doing a lot less work. To sum up, the difference in execution times can be attributed to the difference between: MONTH(BirthDate) == 1 && YearlyIncome <= 50000 and MONTH(BirthDate) == 1 In the first two dataflows YearlyIncome <= 50000 gets evaluated an average of 12.5 times for every row whereas in the third dataflow it is evaluated once and once only. Multiply those 11.5 extra operations by 4.7million rows and you get a significant amount of extra CPU cycles – that’s where our duration difference comes from. The Wrap-up The obvious point here is that adding new components to a dataflow isn’t necessarily going to make it go any slower, moreover you may be able to achieve significant improvements by splitting logic over multiple components rather than one. Performance tuning is all about reducing the amount of work that needs to be done and that doesn’t necessarily mean use less components, indeed sometimes you may be able to reduce workload in ways that aren’t immediately obvious as I think I have proven here. Of course there are many variables in play here and your mileage will most definitely vary. I encourage you to download the package and see if you get similar results – let me know in the comments. The package contains all three dataflows plus a fourth dataflow that will create the 2GB raw file for you (you will also need the [AdventureWorksDW2008] sample database from which to source the data); simply disable all dataflows except the one you want to test before executing the package and remember, execute using dtexec, not within BIDS. If you want to explore dataflow performance tuning in more detail then here are some links you might want to check out: Inequality joins, Asynchronous transformations and Lookups Destination Adapter Comparison Don’t turn the dataflow into a cursor SSIS Dataflow – Designing for performance (webinar) Any comments? Let me know! @Jamiet

    Read the article

  • C#: My World Clock

    - by Bruce Eitman
    [Placeholder:  I will post the entire project soon] I have been working on cleaning my office of 8 years of stuff from several engineers working on many projects.  It turns out that we have a few extra single board computers with displays, so at the end of the day last Friday I though why not create a little application to display the time, you know, a clock.  How difficult could that be?  It turns out that it is quite simple – until I decided to gold plate the project by adding time displays for our offices around the world. I decided to use C#, which actually made creating the main clock quite easy.   The application was simply a text box and a timer.  I set the timer to fire a couple of times a second, and when it does use a DateTime object to get the current time and retrieve a string to display. And I could have been done, but of course that gold plating came up.   Seems simple enough, simply offset the time from the local time to the location that I want the time for and display it.    Sure enough, I had the time displayed for UK, Italy, Kansas City, Japan and China in no time at all. But it is October, and for those of us still stuck with Daylight Savings Time, we know that the clocks are about to change.   My first attempt was to simply check to see if the local time was DST or Standard time, then change the offset for China.  China doesn’t have Daylight Savings Time. If you know anything about the time changes around the world, you already know that my plan is flawed – in a big way.   It turns out that the transitions in and out of DST take place at different times around the world.   If you didn’t know that, do a quick search for “Daylight Savings” and you will find many WEB sites dedicated to tracking the time changes dates, and times. Now the real challenge of this application; how do I programmatically find out when the time changes occur and handle them correctly?  After a considerable amount of research it turns out that the solution is to read the data from the registry and parse it to figure out when the time changes occur. Reading Time Change Information from the Registry Reading the data from the registry is simple, using the data is a little more complicated.  First, reading from the registry can be done like:             byte[] binarydata = (byte[])Registry.GetValue("HKEY_LOCAL_MACHINE\\Time Zones\\Eastern Standard Time", "TZI", null);   Where I have hardcoded the registry key for example purposes, but in the end I will use some variables.   We now have a binary blob with the data, but it needs to be converted to use the real data.   To start we will need a couple of structs to hold the data and make it usable.   We will need a SYSTEMTIME and REG_TZI_FORMAT.   You may have expected that we would need a TIME_ZONE_INFORMATION struct, but we don’t.   The data is stored in the registry as a REG_TZI_FORMAT, which excludes some of the values found in TIME_ZONE_INFORMATION.     struct SYSTEMTIME     {         internal short wYear;         internal short wMonth;         internal short wDayOfWeek;         internal short wDay;         internal short wHour;         internal short wMinute;         internal short wSecond;         internal short wMilliseconds;     }       struct REG_TZI_FORMAT     {         internal long Bias;         internal long StdBias;         internal long DSTBias;         internal SYSTEMTIME StandardStart;         internal SYSTEMTIME DSTStart;     }   Now we need to convert the binary blob to a REG_TZI_FORMAT.   To do that I created the following helper functions:         private void BinaryToSystemTime(ref SYSTEMTIME ST, byte[] binary, int offset)         {             ST.wYear = (short)(binary[offset + 0] + (binary[offset + 1] << 8));             ST.wMonth = (short)(binary[offset + 2] + (binary[offset + 3] << 8));             ST.wDayOfWeek = (short)(binary[offset + 4] + (binary[offset + 5] << 8));             ST.wDay = (short)(binary[offset + 6] + (binary[offset + 7] << 8));             ST.wHour = (short)(binary[offset + 8] + (binary[offset + 9] << 8));             ST.wMinute = (short)(binary[offset + 10] + (binary[offset + 11] << 8));             ST.wSecond = (short)(binary[offset + 12] + (binary[offset + 13] << 8));             ST.wMilliseconds = (short)(binary[offset + 14] + (binary[offset + 15] << 8));         }             private REG_TZI_FORMAT ConvertFromBinary(byte[] binarydata)         {             REG_TZI_FORMAT RTZ = new REG_TZI_FORMAT();               RTZ.Bias = binarydata[0] + (binarydata[1] << 8) + (binarydata[2] << 16) + (binarydata[3] << 24);             RTZ.StdBias = binarydata[4] + (binarydata[5] << 8) + (binarydata[6] << 16) + (binarydata[7] << 24);             RTZ.DSTBias = binarydata[8] + (binarydata[9] << 8) + (binarydata[10] << 16) + (binarydata[11] << 24);             BinaryToSystemTime(ref RTZ.StandardStart, binarydata, 4 + 4 + 4);             BinaryToSystemTime(ref RTZ.DSTStart, binarydata, 4 + 16 + 4 + 4);               return RTZ;         }   I am the first to admit that there may be a better way to get the settings from the registry and into the REG_TXI_FORMAT, but I am not a great C# programmer which I have said before on this blog.   So sometimes I chose brute force over elegant. Now that we have the Bias information and the start date information, we can start to make sense of it.   The bias is an offset, in minutes, from local time (if already in local time for the time zone in question) to get to UTC – or as Microsoft defines it: UTC = local time + bias.  Standard bias is an offset to adjust for standard time, which I think is usually zero.   And DST bias is and offset to adjust for daylight savings time. Since we don’t have the local time for a time zone other than the one that the computer is set to, what we first need to do is convert local time to UTC, which is simple enough using:                 DateTime.Now.ToUniversalTime(); Then, since we have UTC we need to do a little math to alter the formula to: local time = UTC – bias.  In other words, we need to subtract the bias minutes. I am ahead of myself though, the standard and DST start dates really aren’t dates.   Instead they indicate the month, day of week and week number of the time change.   The dDay member of SYSTEM time will be set to the week number of the date change indicating that the change happens on the first, second… day of week of the month.  So we need to convert them to dates so that we can determine which bias to use, and when to change to a different bias.   To do that, I wrote the following function:         private DateTime SystemTimeToDateTimeStart(SYSTEMTIME Time, int Year)         {             DayOfWeek[] Days = { DayOfWeek.Sunday, DayOfWeek.Monday, DayOfWeek.Tuesday, DayOfWeek.Wednesday, DayOfWeek.Thursday, DayOfWeek.Friday, DayOfWeek.Saturday };             DateTime InfoTime = new DateTime(Year, Time.wMonth, Time.wDay == 1 ? 1 : ((Time.wDay - 1) * 7) + 1, Time.wHour, Time.wMinute, Time.wSecond, DateTimeKind.Utc);             DateTime BestGuess = InfoTime;             while (BestGuess.DayOfWeek != Days[Time.wDayOfWeek])             {                 BestGuess = BestGuess.AddDays(1);             }             return BestGuess;         }   SystemTimeToDateTimeStart gets two parameters; a SYSTEMTIME and a year.   The reason is that we will try this year and next year because we are interested in start dates that are in the future, not the past.  The function starts by getting a new Datetime with the first possible date and then looking for the correct date. Using the start dates, we can then determine the correct bias to use, and the next date that time will change:             NextTimeChange = StandardChange;             CurrentBias = TimezoneSettings.Bias + TimezoneSettings.DSTBias;             if (DSTChange.Year != 1 && StandardChange.Year != 1)             {                 if (DSTChange.CompareTo(StandardChange) < 0)                 {                     NextTimeChange = DSTChange;                     CurrentBias = TimezoneSettings.StdBias + TimezoneSettings.Bias;                 }             }             else             {                 // I don't like this, but it turns out that China Standard Time                 // has a DSTBias of -60 on every Windows system that I tested.                 // So, if no DST transitions, then just use the Bias without                 // any offset                 CurrentBias = TimezoneSettings.Bias;             }   Note that some time zones do not change time, in which case the years will remain set to 1.   Further, I found that the registry settings are actually wrong in that the DST Bias is set to -60 for China even though there is not DST in China, so I ignore the standard and DST bias for those time zones. There is one thing that I have not solved, and don’t plan to solve.  If the time zone for this computer changes, this application will not update the clock using the new time zone.  I tell  you this because you may need to deal with it – I do not because I won’t let the user get to the control panel applet to change the timezone. Copyright © 2012 – Bruce Eitman All Rights Reserved

    Read the article

  • MSCC: Purpose and benefits of Version Control Systems (VCS)

    Unfortunately, there was no monthly meetup during May. Which means that it was even more important and interesting to go forward with a great topic for this month. Earlier this year I already spoke to Nayar Joolfoo about doing a presentation on version control systems (VCS), and he gladly agreed since then. It was just about finding the right date for the action. Furthermore, it was also a great coincidence that Avinash Meetoo announced on social media networks that Knowledge 7 is about to have a new training on "Effective git" - which correlates to a book title Avinash is currently working on - all the best with your approach on this and reach out to our MSCC craftsmen for recessions. Once again a big Thank you to Orange Ebene Accelerator on providing the venue for us, and the MSCC members involved on securing the time slot for our event. Unfortunately, it's kind of tough to get an early confirmation for our meetups these days. I'll keep you posted on that one as there are some interesting and exciting options coming up soon. Okay, let's talk about the meeting and version control systems again. As usual, I'm going to put my first impression of the meetup: "Absolutely great topic, questions and discussions on version control systems, like git or VSO. I was also highly pleased by the number of first timers and female IT geeks. Hopefully, we will be able to keep this trend for future get-togethers." And I really have to emphasise the amount of fresh blood coming to our gathering. Also, during the initial phase it was surprising to see that exactly those first-timers, most of them students at various campuses here on the island, had absolutely no idea about version control systems. More about further down... Reactions of other attendees If I counted correctly, we had a total of 17 attendees this month, and I'd like to give you feedback from some of them: "Inspiring. Helped me understand more about GIT." -- Sean on event comments "Joined the meetup today with literally no idea what is a version control system. I have several reasons why I should be starting to use VCS as from NOW in my projects. Thanks Nayar, Jochen and other participants :)" -- Yudish on event comments "Was present today and I'm very satisfied.I was not aware if there was a such tool like git available. Thanks to those who contributed for this meetup.It was great. Learned a lot from this meetup!!" -- Leonardo on event comments "Seriously, I can see how it’s going to ease my task and help me save time. Gone are the issues with files backups.  And since I’ll be doing my dissertation this year, using Git would help me a lot for my backups and I’m grateful to Nayar for the great explanation." -- Swan-Iyah on MSCC meetup : Version Controls Hopefully, I'll be able to get some other sources - personal blogs preferred - on our meeting. Geeks, thank you so much for those encouraging comments. It's really great to experience that we, all members of the MSCC, are doing the right thing to get more IT information out, and to help each other to improve and evolve in our professional careers. Our agenda of the day Honestly, we had a bumpy start... First, I was battling a little bit with the movable room divider in order to maximize the space. I mean, we had 24 RSVPs and usually there might additional people coming along. Then, for what ever reason, we were facing power outages - actually twice in short periods. Not too good for the projector after all, but hey it went smooth for the rest of the time being. And last but not least... our first speaker Nayar got stuck somewhere on the road. ;-) Anyway, not a real show-stopper and we used the time until Nayar's arrival to introduce ourselves a little bit. It is always important for me to get to know the "newbies" a little bit, and as a result we had lots of students of university - first year, second year and recent graduates - among them. Surprisingly, none of them was ever in contact with version control systems at all. I mean, this is a shocking discovery! Similar to the ability of touch-typing I'd say that being able to use (and master) any kind of version control system is compulsory in any job in the IT industry. Seriously, I'm wondering what is being taught during the classes on the campus. All of them have to work on semester assessments or final projects, even in small teams of 2-4 people. That's the perfect occasion to get started with VCS. Already in this phase, we had great input from more experienced VCS users, like Sean, Avinash and myself. git - a modern approach to VCS - Nayar What a tour! Nayar gave us the full round of git from start to finish, even touching some more advanced techniques. First, he started to explain about the importance of version control systems as an essential tool for software developers, even working alone on a project, and the ability to have a kind of "time machine" that allows you to inspect and revert to a previous version of source code at any time. Then he showed how easy it is to install git on an Ubuntu based system but also mentioned that git is literally available for any operating system, like Windows, Mac OS X and of course other Linux distributions. Next, he showed us how to set the initial configuration values of user name and email address which simplifies the daily usage of the git client while working with your repositories. Then he initialised and added a new repository for some local development of a blogging software. All commands were done using the command line interface (CLI) so that they can be repeated on any system as reference. The syntax and the procedure is always the same, and Nayar clearly mentioned this to the attendees. Now, having a git repository in place it was about time to work on some "important" changes on the blogging software - just for the sake of demonstrating the ease of use and power of git. One interesting question came very early: "How many commands do we have to learn? It looks quite difficult at the moment" - Well, rest assured that during daily development circles you will need less than 10 git commands on a regular base: git add, commit, push, pull, checkout, and merge And Nayar demo'd all of them. Much to the delight of everyone he also showed gitk which is the git repository browser. It's an UI tool to display changes in a repository or a selected set of commits. This includes visualizing the commit graph, showing information related to each commit, and the files in the trees of each revision. Using gitk to display and browse information of a local git repository And last but not least, we took advantage of the internet connectivity and reached out to various online portals offering git hosting for free. Nayar showed us how to push the local repository into a remote system on github. Showing the web-based git browser and history handling, and then also explained and demo'd on how to connect to existing online repositories in order to get access to either your own source code or other people's open source projects. Next to github, we also spoke about bitbucket and gitlab as potential online platforms for your projects. Have a look at the conditions and details about their free service packages and what you can get additionally as a paying customer. Usually, you already get a lot of services for up to five users for free but there might be other important aspects that might have an impact on your decision. Anyways, moving git-based repositories between systems is a piece of cake, and changing online platforms is possible at any stage of your development. Visual Studio Online (VSO) - Jochen Well, Nayar literally covered all elements of working with git during his session, including the use of external online platforms. So, what would be the advantage of talking about Visual Studio Online (VSO)? First of all, VSO is "just another" online platform for hosting and managing git repositories on remote systems, equivalent to github, bitbucket, or any other web site. At the moment (of writing), Microsoft also provides a free package of up to five users / developers on a git repository but there is more in that package. Of course, it is related to software development on the Windows systems and the bonds are tightened towards the use of Visual Studio but out of experience you are absolutely not restricted to that. Connecting a Linux or Mac OS X machine with a git client or an integrated development environment (IDE) like Eclipse or Xcode works as smooth as expected. So, why should one opt in for VSO? Well, one of the main aspects that I would like to mention here is that VSO integrates the Application Life Cycle Methodology (ALM) of Microsoft in their platform. Meaning that you get agile project management with Backlogs, Sprints, Burn-down charts as well as the ability to track tasks, bug reports and work items next to collaborative team chats. It's the whole package of agile development you'll get. And, something I mentioned briefly during the begin of our meeting, VSO gives you the possibility of an automated continuous integrated (CI) process which builds and can run tests of your source code after each commit of changes. Having a proper CI strategy is also part of the Clean Code Developer practices - on Level Green actually -, and not only simplifies your life as a software developer but also reduces the sources of potential errors. Seamless integration and automated deployment between Microsoft Azure Web Sites and git repository But my favourite feature is the seamless continuous deployment to Microsoft Azure. Especially, while working on web projects it's absolutely astounishing that as soon as you commit your chances it just takes a couple of seconds until your modifications are deployed and available on your Azure-hosted web sites. Upcoming Events and networking Due to the adjusted times, everybody was kind of hungry and we didn't follow up on networking or upcoming events - very unfortunate to my opinion and this will have an impact on future planning of our meetups. Because I rather would like to see more conversations during and at the end of our meetings than everyone just packing their laptops, bags and accessories and rush off to grab some food. I was hoping to get some information regarding this year's Code Challenge - supposedly to be organised during July? Maybe someone could leave a comment on that - but I couldn't get any updates. Well, I'll keep digging... In case that you would like to get more into git and how to use it effectively, please check out Knowledge 7's upcoming course on "Effective git". Thanks Avinash for your vital input into today's conversation and I'm looking forward to get a grip on your book title very soon. My resume of the day Do not work in IT without any kind of version control system! Seriously, without a VCS in place you're doing it wrong. It's like driving a car without seat belts attached or riding your bike without safety helmet. You don't do that! End of discussion. ;-) Nowadays, having access to free (as in cost) tools to install on your machine and numerous online platforms to host your source code for free for up to five users it's a no-brainer to get yourself familiar with VCS. Today's sessions gave a good overview on how to start using git and how to connect to various remote services like github or VSO.

    Read the article

  • Monitoring SQL Server Agent job run times

    - by okeofs
    Introduction A few months back, I was asked how long a particular nightly process took to run. It was a super question and the one thing that struck me was that there were a plethora of factors affecting the processing time. This said, I developed a query to ascertain process run times, the average nightly run times and applied some KPI’s to the end query. The end goal being to enable me to quickly detect anomalies and processes that are running beyond their normal times. As many of you are aware, most of the necessary data for this type of query, lies within the MSDB database. The core portion of the query is shown below.select sj.name,sh.run_date, sh.run_duration, case when len(sh.run_duration) = 6 then convert(varchar(8),sh.run_duration) when len(sh.run_duration) = 5 then '0' + convert(varchar(8),sh.run_duration) when len(sh.run_duration) = 4 then '00' + convert(varchar(8),sh.run_duration) when len(sh.run_duration) = 3 then '000' + convert(varchar(8),sh.run_duration) when len(sh.run_duration) = 2 then '0000' + convert(varchar(8),sh.run_duration) when len(sh.run_duration) = 1 then '00000' + convert(varchar(8),sh.run_duration) end as tt from dbo.sysjobs sj with (nolock) inner join dbo.sysjobHistory sh with (nolock) on sj.job_id = sh.job_id where sj.name = 'My Agent Job' and [sh.Message] like '%The job%') Run_date and run_duration are obvious fields. The field ‘Name’ is the name of the job that we wish to follow. The only major challenge was that the format of the run duration which was not as ‘user friendly’ as I would have liked. As an example, the run duration 1 hour 10 minutes and 3 seconds would be displayed as 11003; whereas I wanted it to display this in a more user friendly manner as 01:10:03. In order to achieve this effect, we need to add leading zeros to the run_duration based upon the case logic shown above. At this point what we need to do add colons between the hours and minutes and one between the minutes and seconds. To achieve this I nested the query shown above (in purple) within a ‘super’ query. Thus the run time ([Run Time]) is constructed concatenating a series of substrings (See below in Blue). select run_date,substring(convert(varchar(20),tt),1,2) + ':' +substring(convert(varchar(20),tt),3,2) + ':' +substring(convert(varchar(20),tt),5,2) as [run_time] from (select sj.name,sh.run_date, sh.run_duration,case when len(sh.run_duration) = 6 then convert(varchar(8),sh.run_duration)when len(sh.run_duration) = 5 then '0' + convert(varchar(8),sh.run_duration)when len(sh.run_duration) = 4 then '00' + convert(varchar(8),sh.run_duration)when len(sh.run_duration) = 3 then '000' + convert(varchar(8),sh.run_duration)when len(sh.run_duration) = 2 then '0000' + convert(varchar(8),sh.run_duration)when len(sh.run_duration) = 1 then '00000' + convert(varchar(8),sh.run_duration)end as ttfrom dbo.sysjobs sj with (nolock)inner join dbo.sysjobHistory sh with (nolock) on sj.job_id = sh.job_id where sj.name = 'My Agent Job'and [sh.Message] like '%The job%') a Now that I had each nightly run time in hours, minutes and seconds (01:10:03), I decided that it would very productive to calculate a rolling run time average. To do this, I decided to do the calculations in base units of seconds. This said, I encapsulated the query shown above into a further ‘super’ query (see the code in RED below). This encapsulation is shown below. The astute reader will note that I used implied casting from integer to string, which is not the best method to use however it works. This said and if I were constructing the query again I would definitely do an explicit convert. To Recap: I now have a key field of ‘1’, each and every applicable run date and the total number of SECONDS that the process ran for each run date, all of this data within the #rawdata1 temporary table. Select 1 as keyy,run_date,(substring(b.run_time,1,2)*3600) + (substring(b.run_time,4,2)*60) + (substring(b.run_time,7,2)) as run_time_in_Seconds,run_time into #rawdata1 from ( select run_date,substring(convert(varchar(20),tt),1,2) + ':' + substring(convert(varchar(20),tt),3,2) + ':' +substring(convert(varchar(20),tt),5,2) as [run_time] from (select sj.name,sh.run_date, sh.run_duration, case when len(sh.run_duration) = 6 then convert(varchar(8),sh.run_duration)when len(sh.run_duration) = 5 then '0' + convert(varchar(8),sh.run_duration)when len(sh.run_duration) = 4 then '00' + convert(varchar(8),sh.run_duration)when len(sh.run_duration)    = 3 then '000' + convert(varchar(8),sh.run_duration)when len(sh.run_duration)    = 2 then '0000' + convert(varchar(8),sh.run_duration)when len(sh.run_duration) = 1 then '00000' + convert(varchar(8),sh.run_duration)end as ttfrom dbo.sysjobs sj with (nolock)inner join dbo.sysjobHistory sh with (nolock)on sj.job_id = sh.job_id where sj.name = 'My Agent Job'and [sh.Message] like '%The job%') a )b   Calculating the average run time We now select each run time in seconds from #rawdata1 and place the values into another temporary table called #rawdata2. Once again we create a ‘key’, a hardwired ‘1’. select 1 as Keyy, run_time_in_Seconds into #rawdata2 from #rawdata1The purpose of doing so is to make the average time AVG() available to the query immediately without having to do adverse grouping. Applying KPI Logic At this point, we shall apply some logic to determine whether processing times are within the norms. We do this by applying colour names. Obviously, this example is a super one for SSRS and traffic light icons.select rd1.run_date, rd1.run_time, rd1.run_time_in_Seconds ,Avg(rd2.run_time_in_Seconds) as Average_run_time_in_seconds,casewhenConvert(decimal(10,1),rd1.run_time_in_Seconds)/Avg(rd2.run_time_in_Seconds)<= 1.2 then 'Green' when Convert(decimal(10,1),rd1.run_time_in_Seconds)/Avg(rd2.run_time_in_Seconds)< 1.4 then 'Yellow' else 'Red'end as [color], Calculating the Average Run Time in Hours Minutes and Seconds and the end of the query. casewhen len(convert(varchar(2),Avg(rd2.run_time_in_Seconds)/(3600))) = 1 then '0' + convert(varchar(2),Avg(rd2.run_time_in_Seconds)/(3600))else convert(varchar(2),Avg(rd2.run_time_in_Seconds)/(3600))end + ':' + case when len(convert(varchar(2),Avg(rd2.run_time_in_Seconds)%(3600)/60)) = 1 then '0' + convert(varchar(2),Avg(rd2.run_time_in_Seconds)%(3600)/60)else convert(varchar(2),Avg(rd2.run_time_in_Seconds)%(3600)/60)end + ':' + case when len(convert(varchar(2),Avg(rd2.run_time_in_Seconds)%60)) = 1 then '0' + convert(varchar(2),Avg(rd2.run_time_in_Seconds)%60)else convert(varchar(2),Avg(rd2.run_time_in_Seconds)%60)end as [Average Run Time HH:MM:SS] from #rawdata2 rd2 innerjoin #rawdata1 rd1on rd1.keyy = rd2.keyygroup by run_date,rd1.run_time ,rd1.run_time_in_Seconds order by run_date descThe complete code example use msdbgo/*drop table #rawdata1drop table #rawdata2go*/select 1 as keyy,run_date,(substring(b.run_time,1,2)*3600) + (substring(b.run_time,4,2)*60) + (substring(b.run_time,7,2)) as run_time_in_Seconds,run_time into #rawdata1 from (select run_date,substring(convert(varchar(20),tt),1,2) + ':' +substring(convert(varchar(20),tt),3,2) + ':' +substring(convert(varchar(20),tt),5,2) as [run_time] from (select name,run_date, run_duration, casewhenlen(run_duration) = 6 then convert(varchar(8),run_duration)whenlen(run_duration) = 5 then '0' + convert(varchar(8),run_duration)whenlen(run_duration) = 4 then '00' + convert(varchar(8),run_duration)whenlen(run_duration) = 3 then '000' + convert(varchar(8),run_duration)whenlen(run_duration) = 2 then '0000' + convert(varchar(8),run_duration)whenlen(run_duration) = 1 then '00000' + convert(varchar(8),run_duration)end as ttfrom dbo.sysjobs sj with (nolock)innerjoin dbo.sysjobHistory sh with (nolock) on sj.job_id = sh.job_id where name = 'My Agent Job'and [Message] like '%The job%') a ) bselect 1 as Keyy, run_time_in_Seconds into #rawdata2 from #rawdata1select rd1.run_date, rd1.run_time, rd1.run_time_in_Seconds ,Avg(rd2.run_time_in_Seconds) as Average_run_time_in_seconds,casewhenConvert(decimal(10,1),rd1.run_time_in_Seconds)/Avg(rd2.run_time_in_Seconds)<= 1.2 then 'Green' when Convert(decimal(10,1),rd1.run_time_in_Seconds)/Avg(rd2.run_time_in_Seconds)< 1.4 then 'Yellow' else 'Red'end as [color],Case when len(convert(varchar(2),Avg(rd2.run_time_in_Seconds)/(3600))) = 1 then '0' + convert(varchar(2),Avg(rd2.run_time_in_Seconds)/(3600))else convert(varchar(2),Avg(rd2.run_time_in_Seconds)/(3600))end + ':' + case when len(convert(varchar(2),Avg(rd2.run_time_in_Seconds)%(3600)/60)) = 1 then '0' + convert(varchar(2),Avg(rd2.run_time_in_Seconds)%(3600)/60)else convert(varchar(2),Avg(rd2.run_time_in_Seconds)%(3600)/60)end + ':' + case when len(convert(varchar(2),Avg(rd2.run_time_in_Seconds)%60)) = 1 then '0' + convert(varchar(2),Avg(rd2.run_time_in_Seconds)%60)else convert(varchar(2),Avg(rd2.run_time_in_Seconds)%60)end as [Average Run Time HH:MM:SS] from #rawdata2 rd2 innerjoin #rawdata1 rd1on rd1.keyy = rd2.keyygroup by run_date,rd1.run_time ,rd1.run_time_in_Seconds order by run_date desc  

    Read the article

  • SQL SERVER – Weekly Series – Memory Lane – #039

    - by Pinal Dave
    Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 FQL – Facebook Query Language Facebook list following advantages of FQL: Condensed XML reduces bandwidth and parsing costs. More complex requests can reduce the number of requests necessary. Provides a single consistent, unified interface for all of your data. It’s fun! UDF – Get the Day of the Week Function The day of the week can be retrieved in SQL Server by using the DatePart function. The value returned by the function is between 1 (Sunday) and 7 (Saturday). To convert this to a string representing the day of the week, use a CASE statement. UDF – Function to Get Previous And Next Work Day – Exclude Saturday and Sunday While reading ColdFusion blog of Ben Nadel Getting the Previous Day In ColdFusion, Excluding Saturday And Sunday, I realize that I use similar function on my SQL Server Database. This function excludes the Weekends (Saturday and Sunday), and it gets previous as well as next work day. Complete Series of SQL Server Interview Questions and Answers Data Warehousing Interview Questions and Answers – Introduction Data Warehousing Interview Questions and Answers – Part 1 Data Warehousing Interview Questions and Answers – Part 2 Data Warehousing Interview Questions and Answers – Part 3 Data Warehousing Interview Questions and Answers Complete List Download 2008 Introduction to Log Viewer In SQL Server all the windows event logs can be seen along with SQL Server logs. Interface for all the logs is same and can be launched from the same place. This log can be exported and filtered as well. DBCC SHRINKFILE Takes Long Time to Run If you are DBA who are involved with Database Maintenance and file group maintenance, you must have experience that many times DBCC SHRINKFILE operations takes a long time but any other operations with Database are relatively quicker. mssqlsystemresource – Resource Database The purpose of resource database is to facilitates upgrading to the new version of SQL Server without any hassle. In previous versions whenever version of SQL Server was upgraded all the previous version system objects needs to be dropped and new version system objects to be created. 2009 Puzzle – Write Script to Generate Primary Key and Foreign Key In SQL Server Management Studio (SSMS), there is no option to script all the keys. If one is required to script keys they will have to manually script each key one at a time. If database has many tables, generating one key at a time can be a very intricate task. I want to throw a question to all of you if any of you have scripts for the same purpose. Maximizing View of SQL Server Management Studio – Full Screen – New Screen I had explained the following two different methods: 1) Open Results in Separate Tab - This is a very interesting method as result pan shows up in a different tab instead of the splitting screen horizontally. 2) Open SSMS in Full Screen - This works always and to its best. Not many people are aware of this method; hence, very few people use it to enhance performance. 2010 Find Queries using Parallelism from Cached Plan T-SQL script gets all the queries and their execution plan where parallelism operations are kicked up. Pay attention there is TOP 10 is used, if you have lots of transactional operations, I suggest that you change TOP 10 to TOP 50 This is the list of the all the articles in the series of computed columns. SQL SERVER – Computed Column – PERSISTED and Storage This article talks about how computed columns are created and why they take more storage space than before. SQL SERVER – Computed Column – PERSISTED and Performance This article talks about how PERSISTED columns give better performance than non-persisted columns. SQL SERVER – Computed Column – PERSISTED and Performance – Part 2 This article talks about how non-persisted columns give better performance than PERSISTED columns. SQL SERVER – Computed Column and Performance – Part 3 This article talks about how Index improves the performance of Computed Columns. SQL SERVER – Computed Column – PERSISTED and Storage – Part 2 This article talks about how creating index on computed column does not grow the row length of table. SQL SERVER – Computed Columns – Index and Performance This article summarized all the articles related to computed columns. 2011 SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 21 of 31 What is Data Warehousing? What is Business Intelligence (BI)? What is a Dimension Table? What is Dimensional Modeling? What is a Fact Table? What are the Fundamental Stages of Data Warehousing? What are the Different Methods of Loading Dimension tables? Describes the Foreign Key Columns in Fact Table and Dimension Table? What is Data Mining? What is the Difference between a View and a Materialized View? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 22 of 31 What is OLTP? What is OLAP? What is the Difference between OLTP and OLAP? What is ODS? What is ER Diagram? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 23 of 31 What is ETL? What is VLDB? Is OLTP Database is Design Optimal for Data Warehouse? If denormalizing improves Data Warehouse Processes, then why is the Fact Table is in the Normal Form? What are Lookup Tables? What are Aggregate Tables? What is Real-Time Data-Warehousing? What are Conformed Dimensions? What is a Conformed Fact? How do you Load the Time Dimension? What is a Level of Granularity of a Fact Table? What are Non-Additive Facts? What is a Factless Facts Table? What are Slowly Changing Dimensions (SCD)? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 24 of 31 What is Hybrid Slowly Changing Dimension? What is BUS Schema? What is a Star Schema? What Snow Flake Schema? Differences between the Star and Snowflake Schema? What is Difference between ER Modeling and Dimensional Modeling? What is Degenerate Dimension Table? Why is Data Modeling Important? What is a Surrogate Key? What is Junk Dimension? What is a Data Mart? What is the Difference between OLAP and Data Warehouse? What is a Cube and Linked Cube with Reference to Data Warehouse? What is Snapshot with Reference to Data Warehouse? What is Active Data Warehousing? What is the Difference between Data Warehousing and Business Intelligence? What is MDS? Explain the Paradigm of Bill Inmon and Ralph Kimball. SQL SERVER – Azure Interview Questions and Answers – Guest Post by Paras Doshi – Day 25 of 31 Paras Doshi has submitted 21 interesting question and answers for SQL Azure. 1.What is SQL Azure? 2.What is cloud computing? 3.How is SQL Azure different than SQL server? 4.How many replicas are maintained for each SQL Azure database? 5.How can we migrate from SQL server to SQL Azure? 6.Which tools are available to manage SQL Azure databases and servers? 7.Tell me something about security and SQL Azure. 8.What is SQL Azure Firewall? 9.What is the difference between web edition and business edition? 10.How do we synchronize On Premise SQL server with SQL Azure? 11.How do we Backup SQL Azure Data? 12.What is the current pricing model of SQL Azure? 13.What is the current limitation of the size of SQL Azure DB? 14.How do you handle datasets larger than 50 GB? 15.What happens when the SQL Azure database reaches Max Size? 16.How many databases can we create in a single server? 17.How many servers can we create in a single subscription? 18.How do you improve the performance of a SQL Azure Database? 19.What is code near application topology? 20.What were the latest updates to SQL Azure service? 21.When does a workload on SQL Azure get throttled? SQL SERVER – Interview Questions and Answers – Guest Post by Malathi Mahadevan – Day 26 of 31 Malachi had asked a simple question which has several answers. Each answer makes you think and ponder about the reality of the IT world. Look at the simple question – ‘What is the toughest challenge you have faced in your present job and how did you handle it’? and its various answers. Each answer has its own story. SQL SERVER – Interview Questions and Answers – Guest Post by Rick Morelan – Day 27 of 31 Rick Morelan of Joes2Pros has written an excellent blog post on the subject how to find top N values. Most people are fully aware of how the TOP keyword works with a SELECT statement. After years preparing so many students to pass the SQL Certification I noticed they were pretty well prepared for job interviews too. Yes, they would do well in the interview but not great. There seemed to be a few questions that would come up repeatedly for almost everyone. Rick addresses similar questions in his lucid writing skills. 2012 Observation of Top with Index and Order of Resultset SQL Server has lots of things to learn and share. It is amazing to see how people evaluate and understand different techniques and styles differently when implementing. The real reason may be absolutely different but we may blame something totally different for the incorrect results. Read the blog post to learn more. How do I Record Video and Webcast How to Convert Hex to Decimal or INT Earlier I asked regarding a question about how to convert Hex to Decimal. I promised that I will post an answer with Due Credit to the author but never got around to post a blog post around it. Read the original post over here SQL SERVER – Question – How to Convert Hex to Decimal. Query to Get Unique Distinct Data Based on Condition – Eliminate Duplicate Data from Resultset The natural reaction will be to suggest DISTINCT or GROUP BY. However, not all the questions can be solved by DISTINCT or GROUP BY. Let us see the following example, where a user wanted only latest records to be displayed. Let us see the example to understand further. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Microsoft&rsquo;s new technical computing initiative

    - by Randy Walker
    I made a mental note from earlier in the year.  Microsoft literally buys computers by the truckload.  From what I understand, it’s a typical practice amongst large software vendors.  You plug a few wires in, you test it, and you instantly have mega tera tera flops (don’t hold me to that number).  Microsoft has been trying to plug away at their cloud services (named Azure).  Which, for the layman, means Microsoft runs your software on their computers, and as demand increases you can allocate more computing power on the fly. With this in mind, it doesn’t surprise me that I was recently sent an executive email concerning Microsoft’s new technical computing initiative.  I find it to be a great marketing idea with actual substance behind their real work.  From the programmer academic perspective, in college we dreamed about this type of processing power.  This has decades of computer science theory behind it. A copy of the email received.  (note that I almost deleted this email, thinking it was spam due to it’s length) We don't often think about how complex life really is. Take the relatively simple task of commuting to and from work: it is, in fact, a complicated interplay of variables such as weather, train delays, accidents, traffic patterns, road construction, etc. You can however, take steps to shorten your commute - using a good, predictive understanding of a few of these variables. In fact, you probably are already taking these inputs and instinctively building a predictive model that you act on daily to get to your destination more quickly. Now, when we apply the same method to very complex tasks, this modeling approach becomes much more challenging. Recent world events clearly demonstrated our inability to process vast amounts of information and variables that would have helped to more accurately predict the behavior of global financial markets or the occurrence and impact of a volcano eruption in Iceland. To make sense of issues like these, researchers, engineers and analysts create computer models of the almost infinite number of possible interactions in complex systems. But, they need increasingly more sophisticated computer models to better understand how the world behaves and to make fact-based predictions about the future. And, to do this, it requires a tremendous amount of computing power to process and examine the massive data deluge from cameras, digital sensors and precision instruments of all kinds. This is the key to creating more accurate and realistic models that expose the hidden meaning of data, which gives us the kind of insight we need to solve a myriad of challenges. We have made great strides in our ability to build these kinds of computer models, and yet they are still too difficult, expensive and time consuming to manage. Today, even the most complicated data-rich simulations cannot fully capture all of the intricacies and dependencies of the systems they are trying to model. That is why, across the scientific and engineering world, it is so hard to say with any certainty when or where the next volcano will erupt and what flight patterns it might affect, or to more accurately predict something like a global flu pandemic. So far, we just cannot collect, correlate and compute enough data to create an accurate forecast of the real world. But this is about to change. Innovations in technology are transforming our ability to measure, monitor and model how the world behaves. The implication for scientific research is profound, and it will transform the way we tackle global challenges like health care and climate change. It will also have a huge impact on engineering and business, delivering breakthroughs that could lead to the creation of new products, new businesses and even new industries. Because you are a subscriber to executive e-mails from Microsoft, I want you to be the first to know about a new effort focused specifically on empowering millions of the world's smartest problem solvers. Today, I am happy to introduce Microsoft's Technical Computing initiative. Our goal is to unleash the power of pervasive, accurate, real-time modeling to help people and organizations achieve their objectives and realize their potential. We are bringing together some of the brightest minds in the technical computing community across industry, academia and science at www.modelingtheworld.com to discuss trends, challenges and shared opportunities. New advances provide the foundation for tools and applications that will make technical computing more affordable and accessible where mathematical and computational principles are applied to solve practical problems. One day soon, complicated tasks like building a sophisticated computer model that would typically take a team of advanced software programmers months to build and days to run, will be accomplished in a single afternoon by a scientist, engineer or analyst working at the PC on their desktop. And as technology continues to advance, these models will become more complete and accurate in the way they represent the world. This will speed our ability to test new ideas, improve processes and advance our understanding of systems. Our technical computing initiative reflects the best of Microsoft's heritage. Ever since Bill Gates articulated the then far-fetched vision of "a computer on every desktop" in the early 1980's, Microsoft has been at the forefront of expanding the power and reach of computing to benefit the world. As someone who worked closely with Bill for many years at Microsoft, I am happy to share with you that the passion behind that vision is fully alive at Microsoft and is carried out in the creation of our new Technical Computing group. Enabling more people to make better predictions We have seen the impact of making greater computing power more available firsthand through our investments in high performance computing (HPC) over the past five years. Scientists, engineers and analysts in organizations of all sizes and sectors are finding that using distributed computational power creates societal impact, fuels scientific breakthroughs and delivers competitive advantages. For example, we have seen remarkable results from some of our current customers: Malaria strikes 300,000 to 500,000 people around the world each year. To help in the effort to eradicate malaria worldwide, scientists at Intellectual Ventures use software that simulates how the disease spreads and would respond to prevention and control methods, such as vaccines and the use of bed nets. Technical computing allows researchers to model more detailed parameters for more accurate results and receive those results in less than an hour, rather than waiting a full day. Aerospace engineering firm, a.i. solutions, Inc., needed a more powerful computing platform to keep up with the increasingly complex computational needs of its customers: NASA, the Department of Defense and other government agencies planning space flights. To meet that need, it adopted technical computing. Now, a.i. solutions can produce detailed predictions and analysis of the flight dynamics of a given spacecraft, from optimal launch times and orbit determination to attitude control and navigation, up to eight times faster. This enables them to avoid mistakes in any areas that can cause a space mission to fail and potentially result in the loss of life and millions of dollars. Western & Southern Financial Group faced the challenge of running ever larger and more complex actuarial models as its number of policyholders and products grew and regulatory requirements changed. The company chose an actuarial solution that runs on technical computing technology. The solution is easy for the company's IT staff to manage and adjust to meet business needs. The new solution helps the company reduce modeling time by up to 99 percent - letting the team fine-tune its models for more accurate product pricing and financial projections. Our Technical Computing direction Collaborating closely with partners across industry and academia, we must now extend the reach of technical computing even further to help predictive modelers and data explorers make faster, more accurate predictions. As we build the Technical Computing initiative, we will invest in three core areas: Technical computing to the cloud: Microsoft will play a leading role in bringing technical computing power to scientists, engineers and analysts through the cloud. Existing high- performance computing users will benefit from the ability to augment their on-premises systems with cloud resources that enable 'just-in-time' processing. This platform will help ensure processing resources are available whenever they are needed-reliably, consistently and quickly. Simplify parallel development: Today, computers are shipping with more processing power than ever, including multiple cores, but most modern software only uses a small amount of the available processing power. Parallel programs are extremely difficult to write, test and trouble shoot. However, a consistent model for parallel programming can help more developers unlock the tremendous power in today's modern computers and enable a new generation of technical computing. We are delivering new tools to automate and simplify writing software through parallel processing from the desktop... to the cluster... to the cloud. Develop powerful new technical computing tools and applications: We know scientists, engineers and analysts are pushing common tools (i.e., spreadsheets and databases) to the limits with complex, data-intensive models. They need easy access to more computing power and simplified tools to increase the speed of their work. We are building a platform to do this. Our development efforts will yield new, easy-to-use tools and applications that automate data acquisition, modeling, simulation, visualization, workflow and collaboration. This will allow them to spend more time on their work and less time wrestling with complicated technology. Thinking bigger There is so much left to be discovered and so many questions yet to be answered in the fascinating world around us. We believe the technical computing community will show us that we have not seen anything yet. Imagine just some of the breakthroughs this community could make possible: Better predictions to help improve the understanding of pandemics, contagion and global health trends. Climate change models that predict environmental, economic and human impact, accessible in real-time during key discussions and debates. More accurate prediction of natural disasters and their impact to develop more effective emergency response plans. With an ambitious charter in hand, this new team is ready to build on our progress to-date and execute Microsoft's technical computing vision over the months and years ahead. We will steadily invest in the right technologies, tools and talent, and work to bring together the technical computing community. I invite you to visit www.modelingtheworld.com today. We welcome your ideas and feedback. I look forward to making this journey with you and others who want to answer the world's biggest questions, discover solutions to problems that seem impossible and uncover a host of new opportunities to change the world we live in for the better. Bob

    Read the article

  • How to Audit and Monitor BI Publisher Reports Access?

    - by kanichiro.nishida
    Do you know who is accessing to which report at what time at your reporting environment ? As you delivered the BI Publisher reports to the production environment and your users start using them as part of their daily business operations you might wonder such questions. With compliance becoming an integral part of any business requirement, auditing your reporting environment is also becoming one of the most critical and hot agenda in today’s enterprise reporting deployments. Also, I believe that auditing the reporting environment is not just for the compliance, but also the way to understand how your users are using the reports and be able to improve the user reporting experience. BI Publisher have introduced Enterprise Level Auditing feature with its 11G release, with an integration of Oracle Fusion Middleware Audit Framework, which comes out of the box with the installation. Yes, this is another great example of the benefit of its tight integration with Fusion Middleware introduced with BI Publisher 11g release. What Information Can I Know about our Reporting Environment? With this new Auditing feature you can now gain the following insights. When a particular user login or logout What report is accessed by who and when and how How long does it take to process a particular report Yes, it’s all there. This is a great news for 10G users, right ? I used to be one of them working with many different IT organizations and were craving for this, but it’s here now with 11G! How Can I Access to the Auditing Information? With the Fusion Middleware Auditing Framework, BI Publisher feed such information either to a log file or to a database. If you decided to get the data into the database then, of course you know, you can use BI Publisher to report and publish, or visualize the data to gain more insights. One thing though, in order to feed the data it requires a few extra steps, which I’ll cover it later.  Regardless of whether it’s the log file or the database to store the Auditing data, first, you need to enable the Auditing feature, which is not enabled as default. So, let’s take a look at how to enable it. How to Enable Auditing Feature? Here is a quick list of the steps: Enable Auditing related properties in BI Publisher configuration file Copy component_events.xml file to Fusion Middleware Audit Framework’s location Enable Auditing Policy with Fusion Middleware Control (Enterprise Manager) Restart WebLogic Server Enable Auditing related properties in BI Publisher configuration file Open xmlp-server-config.xml file, which is located under $BI_HOME/ user_projects/domains/bifoundation_domain/config/bipublisher/repository/Admin/Configuration directory. Set the following three properties values to ‘true’. AUDIT_ENABLED MONITORING_ENABLED AUDIT_JPS_INTEGRATION The ‘AUDIT_JPS_INTEGRATION’ is not in the file as default, so you need to add this. Here is an example of how it looks for the xmlp-server-config.xml file after the modification. <?xml version="1.0" encoding="UTF-8" standalone="no"?><xmlpConfigxmlns="http://xmlns.oracle.com/oxp/xmlp"> <property name="SAW_SERVER" value="adc6160510"/> <property name="SAW_SESSION_TIMEOUT" value="90"/> <property name="DEBUG_LEVEL" value="exception"/> <property name="SAW_PORT" value="7001"/> <property name="SAW_PASSWORD" value=""/> <property name="SAW_PROTOCOL" value="http"/> <property name="SAW_VERSION" value="v6"/> <property name="SAW_USERNAME" value=""/> <property name="SAW_URL_SUFFIX" value="analytics/saw.dll"/> <property name="MONITORING_ENABLED" value="true"/> <property name="MONITORING_DEFAULT_HISTORY_SIZE" value="30"/> <property name="AUDIT_ENABLED" value="true"/> <property name="JSESSION_RESET_DISABLED" value="true"/> <property name="SECURITY_MODEL" value="ORACLE_AS_JPS"/> <property name="AUDIT_JPS_INTEGRATION" value="true"/> </xmlpConfig>   Copy component_events.xml file to Audit Framework’s location There is a Audit related configuration file provided by BI Publisher that needs to be copied to the Audit Framework location. 1. Go to the following directory. $BI_HOME /oracle_common/modules/oracle.iau_11.1.1/components 2. Create a directory called ‘xmlpserver’ 3. Copy component_events.xml file from /user_projects/domains/bifoundation_domain/config/bipublisher/repository/Admin/Audit To the newly created ‘xmlpserver’ directory. Enable Auditing Policy with Fusion Middleware Control (EM) Now you can set a level of the auditing for each BI Publisher’s auditing type by using Fusion Middleware Control (a.k.a. Enterprise Manager). 1. Login to Fusion Middleware Control UI http://hostname:port/em (e.g. reporting.oracle.com:7001/em) 2. Access to Audit Policy configuration UI from the menu Under WebLogic Domain, right-click bifoundation_domain, select Security and then click Audit Policy.   3. Set Audit Level for BI Publisher. While you can select ‘Custom’ to set a customized level of Auditing for each component, I’m selecting ‘Medium’ for this exercise.   Restart WebLogic Server After all the above settings, now you need to restart the WebLogic Server instance in order to take those changes in effect. If you’re on Windows you can simply do this by selecting ‘Stop BI Servers’ and ‘Start BI Servers’ from the Start menu. If you’re on Linux then you can run ‘stopWebLogic.sh’ and ‘startWebLogic.sh’, which can be found under $BI_HOME/user_projects/domains/bifoundation_domain/bin Start Auditing! Now assuming that you have completed the above steps successfully, then from this point on any reporting activity should be audited and stored in the auditing log file, which can be found at $BI_HOME/user_projects/domains/bifoundation_domain/servers/AdminServer/logs/auditlogs/xmlpserver/audit.log And here is a sample of the log file: 2011-02-18 02:25:49.928 "" "ReportRendering" true - "82d4bdc47b99b33c:-7e3f334f:12e365c4d9c:-8000-0000000000000022,0" - - - - "bipublisher(11.1.1)" "ReportExecution" "200" "" "/Sample Lite/Published Reporting/Reports/Balance Letter.xdo" "pdf" "RTF Corp Styles" "en_US" - - - - - - - - - - - - - - 86608512 486989824 24517 169 - - - 2011-02-18 02:25:49.929 "steve.jobs" "ReportRequest" true - "82d4bdc47b99b33c:-7e3f334f:12e365c4d9c:-8000-0000000000000022,0" - - - - "bipublisher(11.1.1)" "ReportAccess" "200" "" "" "pdf" "RTF Corp Styles" - - - true - - - - - - - - - - - - - - - - - - 2011-02-18 03:25:49.554 "" "ReportDataProcess" true - "82d4bdc47b99b33c:-7e3f334f:12e365c4d9c:-8000-0000000000000022,0" - - - - "bipublisher(11.1.1)" "ReportExecution" "260" "" "/Sample Lite/Published Reporting/Reports/Balance Letter.xdo" - - - - - - - - - - - - - - - - - 34980200 554033152 - 134 - - - 2011-02-18 03:25:50.282 "" "ReportRendering" true - "82d4bdc47b99b33c:-7e3f334f:12e365c4d9c:-8000-0000000000000022,0" - - - - "bipublisher(11.1.1)" "ReportExecution" "263" "" "/Sample Lite/Published Reporting/Reports/Balance Letter.xdo" "pdf" "RTF Corp Styles" "en_US" - - - - - - - - - - - - - - 16158944 554033152 24517 503 - - - 2011-02-18 03:25:50.282 "steve.jobs" "ReportRequest" true - "82d4bdc47b99b33c:-7e3f334f:12e365c4d9c:-8000-0000000000000022,0" - - - - "bipublisher(11.1.1)" "ReportAccess" "263" "" "" "pdf" "RTF Corp Styles" - - - true - - - - - - - - - - - - - - - - - - 2011-02-18 03:30:00.448 "barack.obama" "UserLogin" true - "82d4bdc47b99b33c:-7e3f334f:12e365c4d9c:-8000-0000000000000406,0" - - - - "bipublisher(11.1.1)" "UserSession" "26" "" - - - - - - - - - - - - - - - - - - - - - - - - - From the above log file you can tell a user ‘steve.jobs’ was running some reports like ‘Balance Letter’ around afternoon on 2/18 and another user ‘barack.obama’ logged into the system at 3:30 on the same day. Yes, every login and log out will be recorded, and every report access will be recorded in this log file. Now, looking at this text file to understand what’s going on is pretty overwhelming. And accessing to this log file, which is located at the server’s file system where the BI Publisher/WebLogic Server are running, is another challenge in typical deployment scenarios. And that’s where the database storage option for the Auditing data  comes into a picture. I’ll talk about this tomorrow, so stay tuned!  

    Read the article

  • New Big Data Appliance Security Features

    - by mgubar
    The Oracle Big Data Appliance (BDA) is an engineered system for big data processing.  It greatly simplifies the deployment of an optimized Hadoop Cluster – whether that cluster is used for batch or real-time processing.  The vast majority of BDA customers are integrating the appliance with their Oracle Databases and they have certain expectations – especially around security.  Oracle Database customers have benefited from a rich set of security features:  encryption, redaction, data masking, database firewall, label based access control – and much, much more.  They want similar capabilities with their Hadoop cluster.    Unfortunately, Hadoop wasn’t developed with security in mind.  By default, a Hadoop cluster is insecure – the antithesis of an Oracle Database.  Some critical security features have been implemented – but even those capabilities are arduous to setup and configure.  Oracle believes that a key element of an optimized appliance is that its data should be secure.  Therefore, by default the BDA delivers the “AAA of security”: authentication, authorization and auditing. Security Starts at Authentication A successful security strategy is predicated on strong authentication – for both users and software services.  Consider the default configuration for a newly installed Oracle Database; it’s been a long time since you had a legitimate chance at accessing the database using the credentials “system/manager” or “scott/tiger”.  The default Oracle Database policy is to lock accounts thereby restricting access; administrators must consciously grant access to users. Default Authentication in Hadoop By default, a Hadoop cluster fails the authentication test. For example, it is easy for a malicious user to masquerade as any other user on the system.  Consider the following scenario that illustrates how a user can access any data on a Hadoop cluster by masquerading as a more privileged user.  In our scenario, the Hadoop cluster contains sensitive salary information in the file /user/hrdata/salaries.txt.  When logged in as the hr user, you can see the following files.  Notice, we’re using the Hadoop command line utilities for accessing the data: $ hadoop fs -ls /user/hrdataFound 1 items-rw-r--r--   1 oracle supergroup         70 2013-10-31 10:38 /user/hrdata/salaries.txt$ hadoop fs -cat /user/hrdata/salaries.txtTom Brady,11000000Tom Hanks,5000000Bob Smith,250000Oprah,300000000 User DrEvil has access to the cluster – and can see that there is an interesting folder called “hrdata”.  $ hadoop fs -ls /user Found 1 items drwx------   - hr supergroup          0 2013-10-31 10:38 /user/hrdata However, DrEvil cannot view the contents of the folder due to lack of access privileges: $ hadoop fs -ls /user/hrdata ls: Permission denied: user=drevil, access=READ_EXECUTE, inode="/user/hrdata":oracle:supergroup:drwx------ Accessing this data will not be a problem for DrEvil. He knows that the hr user owns the data by looking at the folder’s ACLs. To overcome this challenge, he will simply masquerade as the hr user. On his local machine, he adds the hr user, assigns that user a password, and then accesses the data on the Hadoop cluster: $ sudo useradd hr $ sudo passwd $ su hr $ hadoop fs -cat /user/hrdata/salaries.txt Tom Brady,11000000 Tom Hanks,5000000 Bob Smith,250000 Oprah,300000000 Hadoop has not authenticated the user; it trusts that the identity that has been presented is indeed the hr user. Therefore, sensitive data has been easily compromised. Clearly, the default security policy is inappropriate and dangerous to many organizations storing critical data in HDFS. Big Data Appliance Provides Secure Authentication The BDA provides secure authentication to the Hadoop cluster by default – preventing the type of masquerading described above. It accomplishes this thru Kerberos integration. Figure 1: Kerberos Integration The Key Distribution Center (KDC) is a server that has two components: an authentication server and a ticket granting service. The authentication server validates the identity of the user and service. Once authenticated, a client must request a ticket from the ticket granting service – allowing it to access the BDA’s NameNode, JobTracker, etc. At installation, you simply point the BDA to an external KDC or automatically install a highly available KDC on the BDA itself. Kerberos will then provide strong authentication for not just the end user – but also for important Hadoop services running on the appliance. You can now guarantee that users are who they claim to be – and rogue services (like fake data nodes) are not added to the system. It is common for organizations to want to leverage existing LDAP servers for common user and group management. Kerberos integrates with LDAP servers – allowing the principals and encryption keys to be stored in the common repository. This simplifies the deployment and administration of the secure environment. Authorize Access to Sensitive Data Kerberos-based authentication ensures secure access to the system and the establishment of a trusted identity – a prerequisite for any authorization scheme. Once this identity is established, you need to authorize access to the data. HDFS will authorize access to files using ACLs with the authorization specification applied using classic Linux-style commands like chmod and chown (e.g. hadoop fs -chown oracle:oracle /user/hrdata changes the ownership of the /user/hrdata folder to oracle). Authorization is applied at the user or group level – utilizing group membership found in the Linux environment (i.e. /etc/group) or in the LDAP server. For SQL-based data stores – like Hive and Impala – finer grained access control is required. Access to databases, tables, columns, etc. must be controlled. And, you want to leverage roles to facilitate administration. Apache Sentry is a new project that delivers fine grained access control; both Cloudera and Oracle are the project’s founding members. Sentry satisfies the following three authorization requirements: Secure Authorization:  the ability to control access to data and/or privileges on data for authenticated users. Fine-Grained Authorization:  the ability to give users access to a subset of the data (e.g. column) in a database Role-Based Authorization:  the ability to create/apply template-based privileges based on functional roles. With Sentry, “all”, “select” or “insert” privileges are granted to an object. The descendants of that object automatically inherit that privilege. A collection of privileges across many objects may be aggregated into a role – and users/groups are then assigned that role. This leads to simplified administration of security across the system. Figure 2: Object Hierarchy – granting a privilege on the database object will be inherited by its tables and views. Sentry is currently used by both Hive and Impala – but it is a framework that other data sources can leverage when offering fine-grained authorization. For example, one can expect Sentry to deliver authorization capabilities to Cloudera Search in the near future. Audit Hadoop Cluster Activity Auditing is a critical component to a secure system and is oftentimes required for SOX, PCI and other regulations. The BDA integrates with Oracle Audit Vault and Database Firewall – tracking different types of activity taking place on the cluster: Figure 3: Monitored Hadoop services. At the lowest level, every operation that accesses data in HDFS is captured. The HDFS audit log identifies the user who accessed the file, the time that file was accessed, the type of access (read, write, delete, list, etc.) and whether or not that file access was successful. The other auditing features include: MapReduce:  correlate the MapReduce job that accessed the file Oozie:  describes who ran what as part of a workflow Hive:  captures changes were made to the Hive metadata The audit data is captured in the Audit Vault Server – which integrates audit activity from a variety of sources, adding databases (Oracle, DB2, SQL Server) and operating systems to activity from the BDA. Figure 4: Consolidated audit data across the enterprise.  Once the data is in the Audit Vault server, you can leverage a rich set of prebuilt and custom reports to monitor all the activity in the enterprise. In addition, alerts may be defined to trigger violations of audit policies. Conclusion Security cannot be considered an afterthought in big data deployments. Across most organizations, Hadoop is managing sensitive data that must be protected; it is not simply crunching publicly available information used for search applications. The BDA provides a strong security foundation – ensuring users are only allowed to view authorized data and that data access is audited in a consolidated framework.

    Read the article

  • CodePlex Daily Summary for Saturday, January 08, 2011

    CodePlex Daily Summary for Saturday, January 08, 2011Popular ReleasesExtended WPF Toolkit: Extended WPF Toolkit - 1.3.0: What's in the 1.3.0 Release?BusyIndicator ButtonSpinner ChildWindow ColorPicker - Updated (Breaking Changes) DateTimeUpDown - New Control Magnifier - New Control MaskedTextBox - New Control MessageBox NumericUpDown RichTextBox RichTextBoxFormatBar - Updated .NET 3.5 binaries and SourcePlease note: The Extended WPF Toolkit 3.5 is dependent on .NET Framework 3.5 and the WPFToolkit. You must install .NET Framework 3.5 and the WPFToolkit in order to use any features in the To...sNPCedit: sNPCedit v0.9d: added elementclient coordinate catcher to catch coordinates select a target (ingame) i.e. your char, npc or monster than click the button and coordinates+direction will be transfered to the selected row in the table corrected labels from Rot to Direction (because it is a vector)AutoLoL: AutoLoL v1.5.2: Implemented the Auto Updater Fix: Your settings will no longer be cleared with new releases of AutoLoL The mastery Editor and Browser now have their own tabs instead of nested tabs The Browser tab will only show the masteries matching ALL filters instead of just one Added a 'Browse' button in the Mastery Editor tab to open the Masteries Directory The Browser tab now shows a message when there are no mastery files in the Masteries Directory Fix: Fixed the Save As dialog again, for ...Ionics Isapi Rewrite Filter: 2.1 latest stable: V2.1 is stable, and is in maintenance mode. This is v2.1.1.25. It is a bug-fix release. There are no new features. 28629 29172 28722 27626 28074 29164 27659 27900 many documentation updates and fixes proper x64 build environment. This release includes x64 binaries in zip form, but no x64 MSI file. You'll have to manually install x64 servers, following the instructions in the documentation.StyleCop for ReSharper: StyleCop for ReSharper 5.1.14980.000: A considerable amount of work has gone into this release: Huge focus on performance around the violation scanning subsystem: - caching added to reduce IO operations around reading and merging of settings files - caching added to reduce creation of expensive objects Users should notice condsiderable perf boost and a decrease in memory usage. Bug Fixes: - StyleCop's new ObjectBasedEnvironment object does not resolve the StyleCop installation path, thus it does not return the correct path ...VivoSocial: VivoSocial 7.4.1: New release with bug fixes and updates for performance.SSH.NET Library: 2011.1.6: Fixes CommandTimeout default value is fixed to infinite. Port Forwarding feature improvements Memory leaks fixes New Features Add ErrorOccurred event to handle errors that occurred on different thread New and improve SFTP features SftpFile now has more attributes and some operations Most standard operations now available Allow specify encoding for command execution KeyboardInteractiveConnectionInfo class added for "keyboard-interactive" authentication. Add ability to specify bo....NET Extensions - Extension Methods Library for C# and VB.NET: Release 2011.03: Added lot's of new extensions and new projects for MVC and Entity Framework. object.FindTypeByRecursion Int32.InRange String.RemoveAllSpecialCharacters String.IsEmptyOrWhiteSpace String.IsNotEmptyOrWhiteSpace String.IfEmptyOrWhiteSpace String.ToUpperFirstLetter String.GetBytes String.ToTitleCase String.ToPlural DateTime.GetDaysInYear DateTime.GetPeriodOfDay IEnumberable.RemoveAll IEnumberable.Distinct ICollection.RemoveAll IList.Join IList.Match IList.Cast Array.IsNullOrEmpty Array.W...VidCoder: 0.8.0: Added x64 version. Made the audio output preview more detailed and accurate. If the chosen encoder or mixdown is incompatible with the source, the fallback that will be used is displayed. Added "Auto" to the audio mixdown choices. Reworked non-anamorphic size calculation to work better with non-standard pixel aspect ratios and cropping. Reworked Custom anamorphic to be more intuitive and allow display width to be set automatically (Thanks, Statick). Allowing higher bitrates for 6-ch....NET Voice Recorder: Auto-Tune Release: This is the source code and binaries to accompany the article on the Coding 4 Fun website. It is the Auto Tuner release of the .NET Voice Recorder application.BloodSim: BloodSim - 1.3.2.0: - Simulation Log is now automatically disabled and hidden when running 10 or more iterations - Hit and Expertise are now entered by Rating, and include option for a Racial Expertise bonus - Added option for boss to use a periodic magic ability (Dragon Breath) - Added option for boss to periodically Enrage, gaining a Damage/Attack Speed buffEnhSim: EnhSim 2.2.9 BETA: 2.2.9 BETAThis release supports WoW patch 4.03a at level 85 To use this release, you must have the Microsoft Visual C++ 2010 Redistributable Package installed. This can be downloaded from http://www.microsoft.com/downloads/en/details.aspx?FamilyID=A7B7A05E-6DE6-4D3A-A423-37BF0912DB84 To use the GUI you must have the .NET 4.0 Framework installed. This can be downloaded from http://www.microsoft.com/downloads/en/details.aspx?FamilyID=9cfb2d51-5ff4-4491-b0e5-b386f32c0992 - Added in the Gobl...xUnit.net - Unit Testing for .NET: xUnit.net 1.7 Beta: xUnit.net release 1.7 betaBuild #1533 Important notes for Resharper users: Resharper support has been moved to the xUnit.net Contrib project. Important note for TestDriven.net users: If you are having issues running xUnit.net tests in TestDriven.net, especially on 64-bit Windows, we strongly recommend you upgrade to TD.NET version 3.0 or later. This release adds the following new features: Added support for ASP.NET MVC 3 Added Assert.Equal(double expected, double actual, int precision)...Json.NET: Json.NET 4.0 Release 1: New feature - Added Windows Phone 7 project New feature - Added dynamic support to LINQ to JSON New feature - Added dynamic support to serializer New feature - Added INotifyCollectionChanged to JContainer in .NET 4 build New feature - Added ReadAsDateTimeOffset to JsonReader New feature - Added ReadAsDecimal to JsonReader New feature - Added covariance to IJEnumerable type parameter New feature - Added XmlSerializer style Specified property support New feature - Added ...DbDocument: DbDoc Initial Version: DbDoc Initial versionASP .NET MVC CMS (Content Management System): Atomic CMS 2.1.2: Atomic CMS 2.1.2 release notes Atomic CMS installation guide N2 CMS: 2.1: N2 is a lightweight CMS framework for ASP.NET. It helps you build great web sites that anyone can update. Major Changes Support for auto-implemented properties ({get;set;}, based on contribution by And Poulsen) All-round improvements and bugfixes File manager improvements (multiple file upload, resize images to fit) New image gallery Infinite scroll paging on news Content templates First time with N2? Try the demo site Download one of the template packs (above) and open the proj...Mobile Device Detection and Redirection: 0.1.11.10: IMPORTANT CHANGESThis release changes the way some WURFL capabilities and attributes are exposed to .NET developers. If you cast MobileCapabilities to return some values then please read the Release Note before implementing this release. The following code snippet can be used to access any WURFL capability. For instance, if the device is a tablet: string capability = Request.Browser["is_tablet"]; SummaryNew attributes have been added to the redirect section: originalUrlAsQueryString If se...Wii Backup Fusion: Wii Backup Fusion 1.0: - Norwegian translation - French translation - German translation - WBFS dump for analysis - Scalable full HQ cover - Support for log file - Load game images improved - Support for image splitting - Diff for images after transfer - Support for scrubbing modes - Search functionality for log - Recurse depth for Files/Load - Show progress while downloading game cover - Supports more databases for cover download - Game cover loading routines improvedBlogEngine.NET: BlogEngine.NET 2.0: Get DotNetBlogEngine for 3 Months Free! Click Here for More Info 3 Months FREE – BlogEngine.NET Hosting – Click Here! If you want to set up and start using BlogEngine.NET right away, you should download the Web project. If you want to extend or modify BlogEngine.NET, you should download the source code. If you are upgrading from a previous version of BlogEngine.NET, please take a look at the Upgrading to BlogEngine.NET 2.0 instructions. To get started, be sure to check out our installatio...New Projects2011 Scripting Games: Track Changes for the 2011 Scripting Games to be hosted on PoshCode by the Microsoft Scripting Guys and the PowerShell community. ADO.NET Unit Testable Repository Generator: Generates a strongly-typed Object Context and POCO Entities from your EDMX file. Additionally a Repository and an abstract Unit Test for the Repository is created (with a mock for the ObjectContext and mock for the ObjectSets). Bases on the "ADO.NET C# POCO Entity Generator".AshpazKhone: ???AVAilable Rooms: AVAilable Rooms Can be used to show and book rooms on an exchange serverContentManager: Content Manger for SAP Kpro exports the data pool of a content-server by PHIOS-Lists (txt-files) into local binary files. Afterwards this files can be imported in a SAP content-repository . The whole List of the PHIOS Keys can also be downloaded an splitted in practical units.CSLauncher: CSLauncher is a small application providing a launching interface similar to Counter Strike's weapons selection menu. You can organize your favorite applications in CSLauncher in the order you wish, then launch them by remembering their numbers.DbMetadata: A simple lightweight library used to retrieve database metadata.DotNetSiemensPLCToolBoxLibrary: DotNetSiemensPLCToolBoxLibrary A Library for working with Siemens Step5 and Step7 Projects, connecting to S5 or S7 PLC's.EmptyX: EmptyX is a game developped with c# and xna for the XNA CHALLENGE in AlgeriaERASMUS: Erasmus šitFaceDeBouc: Simplebook, le nouveau Facebook CPE! Technlogies .NET Serveur: - BDD SQLServer - Logique métier - Web services Clients: - client lourd WPF - client léger ASP.NETFunction resolver: Application using .NET Expression trees to resolve a function for a set of numbersMLRobotic: Programme de gestion MLRobotic 2011 Sissa ProjectMS CRM Multiple Attachment Download Application: Microsoft Dynamics CRM 4.0 currently does not have a built in way for CRM Administrators to Export multiple attachments from the database and store it on a local directory on the Server or the local PC. This application allows you to do so.Open Data Publisher: Open Data Publisher is a C# MVC application that allows you to quickly and easily publish data from SQL Server tables and views using the OData protocol, and present data from OData and static sources to the public for download and exploration.OttawaZagreb1: Test projectPrague: ????? ?? ???? ??????PSSplunk: PSSplunk is a set of Powershell cmdlets that use the RESTFUL API provided by www.splunk.com The goal is provide access to all of Splunks configuration and search features from the CLI without having to have Splunk installed locally.Reactive C#: Reactive language extensions for C#SemanaWebcasts: Projeto para semana de webcasts.Steward: Personal StewardSVNUG.Tracker: SVNUG.Tracker is an example task tracking application used as a presentation project for the Sangamon Valley .NET User Group (svnug.com). The purpose of SVNUG.Tracker is to be a common problem domain for exercising new .NET technology. SVNUG.Tracker is written in C# 4.The Silverlight Cookbook - A Reference Application for Silverlight 4 LOB Apps: The Silverlight Cookbook is a .NET example application that demonstrates how to build a full-blown line-of-business app using the reference architecture explained at: http://www.dennisdoomen.net/search/label/Silverlight Vkontakte Wpf Client: This is a simple client for social network Vkontakte. It can send message to your friends, download audio files or play it,list user video. Also you can view user status and change yours.Windows Phone Multilingual Keyboard: Multilingual keyboard for Windows Phone 7WPF Controls & MVVM Practices: WPF Controls contains various controls that can be used in applications. Currently Featured * FilteredListBox (filter and selected items notification) * IsDirty functionality for a ViewModel model

    Read the article

  • Can the STREAM and GUPS (single CPU) benchmark use non-local memory in NUMA machine

    - by osgx
    Hello I want to run some tests from HPCC, STREAM and GUPS. They will test memory bandwidth, latency, and throughput (in term of random accesses). Can I start Single CPU test STREAM or Single CPU GUPS on NUMA node with memory interleaving enabled? (Is it allowed by the rules of HPCC - High Performance Computing Challenge?) Usage of non-local memory can increase GUPS results, because it will increase 2- or 4- fold the number of memory banks, available for random accesses. (GUPS typically limited by nonideal memory-subsystem and by slow memory bank opening/closing. With more banks it can do update to one bank, while the other banks are opening/closing.) Thanks. UPDATE: (you may nor reorder the memory accesses that the program makes). But can compiler reorder loops nesting? E.g. hpcc/RandomAccess.c /* Perform updates to main table. The scalar equivalent is: * * u64Int ran; * ran = 1; * for (i=0; i<NUPDATE; i++) { * ran = (ran << 1) ^ (((s64Int) ran < 0) ? POLY : 0); * table[ran & (TableSize-1)] ^= stable[ran >> (64-LSTSIZE)]; * } */ for (j=0; j<128; j++) ran[j] = starts ((NUPDATE/128) * j); for (i=0; i<NUPDATE/128; i++) { /* #pragma ivdep */ for (j=0; j<128; j++) { ran[j] = (ran[j] << 1) ^ ((s64Int) ran[j] < 0 ? POLY : 0); Table[ran[j] & (TableSize-1)] ^= stable[ran[j] >> (64-LSTSIZE)]; } } The main loop here is for (i=0; i<NUPDATE/128; i++) { and the nested loop is for (j=0; j<128; j++) {. Using 'loop interchange' optimization, compiler can convert this code to for (j=0; j<128; j++) { for (i=0; i<NUPDATE/128; i++) { ran[j] = (ran[j] << 1) ^ ((s64Int) ran[j] < 0 ? POLY : 0); Table[ran[j] & (TableSize-1)] ^= stable[ran[j] >> (64-LSTSIZE)]; } } It can be done because this loop nest is perfect loop nest. Is such optimization prohibited by rules of HPCC?

    Read the article

  • Timeout Considerations for Solicit Response

    - by Michael Stephenson
    Background One of the clients I work with had been experiencing some issues for a while surrounding web service timeouts.  It's been a little challenging to work through the problems due to limitations in the diagnostic information available from one of the applications, but I learned some interesting things while troubleshooting the problem which don't seem to have been discussed much in the community so I thought I'd share my findings. In the scenario we have BizTalk trying to make calls to a .net web service which was exposed as a WSE 2 endpoint.  In the process BizTalk will try to make a large number of concurrent web service calls to the application, and the backend application has more than enough infrastructure and capability to handle the load. We have configured the <ConnectionManagement> section of the BizTalk configuration file to support up to 100 concurrent connections from each of our 2 BizTalk send servers to the web servers of the application. The problem we were facing was that the BizTalk side was reporting a significant number of timeouts when calling the web service.   One of the biggest issues was the challenge of being able to correlate a message from BizTalk to the IIS log in the .net application and the custom logs in the application especially when there was a fairly large number of servers hosting the web services.  However the key moment came when we were able to identify a specific call which had taken 40 seconds to execute on the server (yes a long time I know but that's a different story!).  Anyway we were able to identify that this had timed out on the BizTalk side.  Based on the normal 2 minute timeout we knew something unexpected was going on. From here I decided to do some experimentation and I wanted to start outside of BizTalk because my hunch was this was not a BizTalk behaviour but something which was being highlighted by BizTalk because of our large load.     Server-side - Sample Web Service To begin with I created a sample web service.  Nothing special just a vanilla asmx web service hosted in IIS6 on Windows 2003 Standard Edition.  The web service is just a hello world style web service as shown in the below picture.  The only key feature is that the server side web method has a 30 second sleep in it and will trace out some information before and after the thread is set to sleep.      In the configuration for this web service there again is nothing special it's pretty much the most plain simple web service you could build. Client-Side To begin looking at what was happening with our example I created a number of different ways to consume the web service. SoapHttpClientProtocol Example I created a small application which would use a normal proxy generated to call the web service.  It would iterate around a loop and make calls using the begin/end methods so I can do this asynchronously.  I would do a loop of 20 calls with the ConnectionManager configuration section supporting only 5 concurrent connections to the server.     <connectionManagement> <remove address="*"/> <add address = "*" maxconnection = "12" /> <add address = "http://<ServerName>" maxconnection = "5" />                         </connectionManagement> </system.net>     The below picture shows an example of the service calling code, key points are: I have configured the timeout of 40 seconds for the proxy I am using the asynchronous methods on the proxy to call the web service         The Test I would run the client and execute 21 calls to the web service.   The Results  Below is the client side trace showing what's happening on the client. In the below diagram is the web service side trace showing what's happening on the server Some observations on the results are: All of the calls were successful from the clients perspective You could see the next call starting on the server as soon as the previous one had completed Calls took significantly longer than 40 seconds from the start of our call to the return. In fact call 20 took 2 minutes and 30 seconds from the perspective of my code to execute even though I had set the timeout to 40 seconds     WSE 2 Sample In the second example I used the exact same code to call the web service again with a single exception that I modified the web service proxy to derive from WebServiceClient protocol which is part of WSE 2 (using SP3).  The below picture shows the basic code and the key points are: I have configured the timeout of 40 seconds for the proxy I am using the asynchronous methods on the proxy to call the web service        The Test This test would execute 21 calls from the client to the web service.   The Results  The below trace is from the client side: The below trace is from the server side:   Some observations on the trace results for this scenario are: With call 4 if you look at the server side trace it did not start executing on the server for a number of seconds after the other 4 initial calls which were accepted by the server. I re-ran the test and this happened a couple of times and not on most others so at this point I'm just putting this down to something unexpected happening on the development machine and we will leave this observation out of scope of this article. You can see that the client side trace statement executed almost immediately in all cases All calls after the initial few calls would timeout On the client side the calls that did timeout; timed out in a longer duration than the 40 seconds we set as the timeout You can see that as calls were completing on the server the next calls were starting to come through The calls that timed out on the client did actually connect to the server and their server side execution completed successfully     Elaboration on the findings Based on the above observations I have drawn the below sequence diagram to illustrate conceptually what is happening.  Everything except the final web service object is on the client side of the call. In the diagram below I've put two notes on the Web Service Proxy to show the two different places where the different base classes seem to start their timeout counters. From the earlier samples we can work out that the timeout counter for the WSE web service proxy starts before the one for the SoapHttpClientProtocol proxy and the WSE one includes the time to get a connection from the pool; whereas the Soap proxy timeout just covers the method execution. One interesting observation is if we rerun the above sample and increase the number of calls from 21 to 100,000 then for the WSE sample we will see a similar pattern where everything after the first few calls will timeout on the client as soon as it makes a connection to the server whereas the soap proxy will happily plug away and process all of the calls without a single timeout. I have actually set the sample running overnight and this did happen. At this point you are probably thinking the same thoughts I was at the time about the differences in behaviour and which is right and why are they different? I'm not sure there is a definitive answer to this in the documentation, or at least not that I could find! I think you just have to consider that they are different and they could have different effects depending on your messaging solution. In lots of situations this is just not an issue as your concurrent requests doesn't get to the situation where you end up throttling the web service calls on the client side, however this is definitely more common with an integration broker such as BizTalk where you often have high throughput requirements.  Some of the considerations you should make Based on this behaviour you should be aware of the following: In a .net application if you are making lots of concurrent web service calls from an application in an asynchronous manner your user may thing they are experiencing poor performance but you think your web service is working well. The problem could be that the client will have a default of 2 connections to remote servers so you should bear this in mind When you are developing a BizTalk solution or a .net solution with the WSE 2 stack you may experience timeouts under load and throttling the number of connections using the max connections element in the configuration file will not help you For an application using WSE2 or SoapHttpClientProtocol an expired timeout will not throw an error until after a connection to the server has been made so you should consider this in your transaction and durability patterns     Our Work Around In the short term for our specific scenario we know that we can handle this by just increasing our timeout value.  There is only a specific small window when we get lots of concurrent traffic that causes this scenario so we should be able to increase the timeout to take into consideration the additional client side wait, and on the odd occasion where we do get a timeout the BizTalk send port retry will handle this. What was causing our original problem was that for that short window we were getting a lot of retries which significantly increased the load on our send servers and highlighted the issue.  Longer Term Solution As a longer term solution this really gives us more ammunition to argue a migration to WCF. The application we are calling has some factors which limit the protocols we can use but with WCF we would have more control on the various timeout options because in WCF you can configure specific parts of the timeout. Summary I've had this blog post on my to do list for ages but hopefully it will be useful to some people to just understand this behaviour and to possibly help you with some performance issues you may have. I do not believe there is too much in the way of documentation particularly around WSE2 and ASMX in this area so again another bit of ammunition for migrating to WCF. I'll try to do a follow up post with the sample for WCF to show how this changes things.

    Read the article

  • How to directly rotate CVImageBuffer image in IOS 4 without converting to UIImage?

    - by Ian Charnas
    I am using OpenCV 2.2 on the iPhone to detect faces. I'm using the IOS 4's AVCaptureSession to get access to the camera stream, as seen in the code that follows. My challenge is that the video frames come in as CVBufferRef (pointers to CVImageBuffer) objects, and they come in oriented as a landscape, 480px wide by 300px high. This is fine if you are holding the phone sideways, but when the phone is held in the upright position I want to rotate these frames 90 degrees clockwise so that OpenCV can find the faces correctly. I could convert the CVBufferRef to a CGImage, then to a UIImage, and then rotate, as this person is doing: Rotate CGImage taken from video frame However that wastes a lot of CPU. I'm looking for a faster way to rotate the images coming in, ideally using the GPU to do this processing if possible. Any ideas? Ian Code Sample: -(void) startCameraCapture { // Start up the face detector faceDetector = [[FaceDetector alloc] initWithCascade:@"haarcascade_frontalface_alt2" withFileExtension:@"xml"]; // Create the AVCapture Session session = [[AVCaptureSession alloc] init]; // create a preview layer to show the output from the camera AVCaptureVideoPreviewLayer *previewLayer = [AVCaptureVideoPreviewLayer layerWithSession:session]; previewLayer.frame = previewView.frame; previewLayer.videoGravity = AVLayerVideoGravityResizeAspectFill; [previewView.layer addSublayer:previewLayer]; // Get the default camera device AVCaptureDevice* camera = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeVideo]; // Create a AVCaptureInput with the camera device NSError *error=nil; AVCaptureInput* cameraInput = [[AVCaptureDeviceInput alloc] initWithDevice:camera error:&error]; if (cameraInput == nil) { NSLog(@"Error to create camera capture:%@",error); } // Set the output AVCaptureVideoDataOutput* videoOutput = [[AVCaptureVideoDataOutput alloc] init]; videoOutput.alwaysDiscardsLateVideoFrames = YES; // create a queue besides the main thread queue to run the capture on dispatch_queue_t captureQueue = dispatch_queue_create("catpureQueue", NULL); // setup our delegate [videoOutput setSampleBufferDelegate:self queue:captureQueue]; // release the queue. I still don't entirely understand why we're releasing it here, // but the code examples I've found indicate this is the right thing. Hmm... dispatch_release(captureQueue); // configure the pixel format videoOutput.videoSettings = [NSDictionary dictionaryWithObjectsAndKeys: [NSNumber numberWithUnsignedInt:kCVPixelFormatType_32BGRA], (id)kCVPixelBufferPixelFormatTypeKey, nil]; // and the size of the frames we want // try AVCaptureSessionPresetLow if this is too slow... [session setSessionPreset:AVCaptureSessionPresetMedium]; // If you wish to cap the frame rate to a known value, such as 10 fps, set // minFrameDuration. videoOutput.minFrameDuration = CMTimeMake(1, 10); // Add the input and output [session addInput:cameraInput]; [session addOutput:videoOutput]; // Start the session [session startRunning]; } - (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection { // only run if we're not already processing an image if (!faceDetector.imageNeedsProcessing) { // Get CVImage from sample buffer CVImageBufferRef cvImage = CMSampleBufferGetImageBuffer(sampleBuffer); // Send the CVImage to the FaceDetector for later processing [faceDetector setImageFromCVPixelBufferRef:cvImage]; // Trigger the image processing on the main thread [self performSelectorOnMainThread:@selector(processImage) withObject:nil waitUntilDone:NO]; } }

    Read the article

  • Solving Big Problems with Oracle R Enterprise, Part II

    - by dbayard
    Part II – Solving Big Problems with Oracle R Enterprise In the first post in this series (see https://blogs.oracle.com/R/entry/solving_big_problems_with_oracle), we showed how you can use R to perform historical rate of return calculations against investment data sourced from a spreadsheet.  We demonstrated the calculations against sample data for a small set of accounts.  While this worked fine, in the real-world the problem is much bigger because the amount of data is much bigger.  So much bigger that our approach in the previous post won’t scale to meet the real-world needs. From our previous post, here are the challenges we need to conquer: The actual data that needs to be used lives in a database, not in a spreadsheet The actual data is much, much bigger- too big to fit into the normal R memory space and too big to want to move across the network The overall process needs to run fast- much faster than a single processor The actual data needs to be kept secured- another reason to not want to move it from the database and across the network And the process of calculating the IRR needs to be integrated together with other database ETL activities, so that IRR’s can be calculated as part of the data warehouse refresh processes In this post, we will show how we moved from sample data environment to working with full-scale data.  This post is based on actual work we did for a financial services customer during a recent proof-of-concept. Getting started with the Database At this point, we have some sample data and our IRR function.  We were at a similar point in our customer proof-of-concept exercise- we had sample data but we did not have the full customer data yet.  So our database was empty.  But, this was easily rectified by leveraging the transparency features of Oracle R Enterprise (see https://blogs.oracle.com/R/entry/analyzing_big_data_using_the).  The following code shows how we took our sample data SimpleMWRRData and easily turned it into a new Oracle database table called IRR_DATA via ore.create().  The code also shows how we can access the database table IRR_DATA as if it was a normal R data.frame named IRR_DATA. If we go to sql*plus, we can also check out our new IRR_DATA table: At this point, we now have our sample data loaded in the database as a normal Oracle table called IRR_DATA.  So, we now proceeded to test our R function working with database data. As our first test, we retrieved the data from a single account from the IRR_DATA table, pull it into local R memory, then call our IRR function.  This worked.  No SQL coding required! Going from Crawling to Walking Now that we have shown using our R code with database-resident data for a single account, we wanted to experiment with doing this for multiple accounts.  In other words, we wanted to implement the split-apply-combine technique we discussed in our first post in this series.  Fortunately, Oracle R Enterprise provides a very scalable way to do this with a function called ore.groupApply().  You can read more about ore.groupApply() here: https://blogs.oracle.com/R/entry/analyzing_big_data_using_the1 Here is an example of how we ask ORE to take our IRR_DATA table in the database, split it by the ACCOUNT column, apply a function that calls our SimpleMWRR() calculation, and then combine the results. (If you are following along at home, be sure to have installed our myIRR package on your database server via  “R CMD INSTALL myIRR”). The interesting thing about ore.groupApply is that the calculation is not actually performed in my desktop R environment from which I am running.  What actually happens is that ore.groupApply uses the Oracle database to perform the work.  And the Oracle database is what actually splits the IRR_DATA table by ACCOUNT.  Then the Oracle database takes the data for each account and sends it to an embedded R engine running on the database server to apply our R function.  Then the Oracle database combines all the individual results from the calls to the R function. This is significant because now the embedded R engine only needs to deal with the data for a single account at a time.  Regardless of whether we have 20 accounts or 1 million accounts or more, the R engine that performs the calculation does not care.  Given that normal R has a finite amount of memory to hold data, the ore.groupApply approach overcomes the R memory scalability problem since we only need to fit the data from a single account in R memory (not all of the data for all of the accounts). Additionally, the IRR_DATA does not need to be sent from the database to my desktop R program.  Even though I am invoking ore.groupApply from my desktop R program, because the actual SimpleMWRR calculation is run by the embedded R engine on the database server, the IRR_DATA does not need to leave the database server- this is both a performance benefit because network transmission of large amounts of data take time and a security benefit because it is harder to protect private data once you start shipping around your intranet. Another benefit, which we will discuss in a few paragraphs, is the ability to leverage Oracle database parallelism to run these calculations for dozens of accounts at once. From Walking to Running ore.groupApply is rather nice, but it still has the drawback that I run this from a desktop R instance.  This is not ideal for integrating into typical operational processes like nightly data warehouse refreshes or monthly statement generation.  But, this is not an issue for ORE.  Oracle R Enterprise lets us run this from the database using regular SQL, which is easily integrated into standard operations.  That is extremely exciting and the way we actually did these calculations in the customer proof. As part of Oracle R Enterprise, it provides a SQL equivalent to ore.groupApply which it refers to as “rqGroupEval”.  To use rqGroupEval via SQL, there is a bit of simple setup needed.  Basically, the Oracle Database needs to know the structure of the input table and the grouping column, which we are able to define using the database’s pipeline table function mechanisms. Here is the setup script: At this point, our initial setup of rqGroupEval is done for the IRR_DATA table.  The next step is to define our R function to the database.  We do that via a call to ORE’s rqScriptCreate. Now we can test it.  The SQL you use to run rqGroupEval uses the Oracle database pipeline table function syntax.  The first argument to irr_dataGroupEval is a cursor defining our input.  You can add additional where clauses and subqueries to this cursor as appropriate.  The second argument is any additional inputs to the R function.  The third argument is the text of a dummy select statement.  The dummy select statement is used by the database to identify the columns and datatypes to expect the R function to return.  The fourth argument is the column of the input table to split/group by.  The final argument is the name of the R function as you defined it when you called rqScriptCreate(). The Real-World Results In our real customer proof-of-concept, we had more sophisticated calculation requirements than shown in this simplified blog example.  For instance, we had to perform the rate of return calculations for 5 separate time periods, so the R code was enhanced to do so.  In addition, some accounts needed a time-weighted rate of return to be calculated, so we extended our approach and added an R function to do that.  And finally, there were also a few more real-world data irregularities that we needed to account for, so we added logic to our R functions to deal with those exceptions.  For the full-scale customer test, we loaded the customer data onto a Half-Rack Exadata X2-2 Database Machine.  As our half-rack had 48 physical cores (and 96 threads if you consider hyperthreading), we wanted to take advantage of that CPU horsepower to speed up our calculations.  To do so with ORE, it is as simple as leveraging the Oracle Database Parallel Query features.  Let’s look at the SQL used in the customer proof: Notice that we use a parallel hint on the cursor that is the input to our rqGroupEval function.  That is all we need to do to enable Oracle to use parallel R engines. Here are a few screenshots of what this SQL looked like in the Real-Time SQL Monitor when we ran this during the proof of concept (hint: you might need to right-click on these images to be able to view the images full-screen to see the entire image): From the above, you can notice a few things (numbers 1 thru 5 below correspond with highlighted numbers on the images above.  You may need to right click on the above images and view the images full-screen to see the entire image): The SQL completed in 110 seconds (1.8minutes) We calculated rate of returns for 5 time periods for each of 911k accounts (the number of actual rows returned by the IRRSTAGEGROUPEVAL operation) We accessed 103m rows of detailed cash flow/market value data (the number of actual rows returned by the IRR_STAGE2 operation) We ran with 72 degrees of parallelism spread across 4 database servers Most of our 110seconds was spent in the “External Procedure call” event On average, we performed 8,200 executions of our R function per second (110s/911k accounts) On average, each execution was passed 110 rows of data (103m detail rows/911k accounts) On average, we did 41,000 single time period rate of return calculations per second (each of the 8,200 executions of our R function did rate of return calculations for 5 time periods) On average, we processed over 900,000 rows of database data in R per second (103m detail rows/110s) R + Oracle R Enterprise: Best of R + Best of Oracle Database This blog post series started by describing a real customer problem: how to perform a lot of calculations on a lot of data in a short period of time.  While standard R proved to be a very good fit for writing the necessary calculations, the challenge of working with a lot of data in a short period of time remained. This blog post series showed how Oracle R Enterprise enables R to be used in conjunction with the Oracle Database to overcome the data volume and performance issues (as well as simplifying the operations and security issues).  It also showed that we could calculate 5 time periods of rate of returns for almost a million individual accounts in less than 2 minutes. In a future post, we will take the same R function and show how Oracle R Connector for Hadoop can be used in the Hadoop world.  In that next post, instead of having our data in an Oracle database, our data will live in Hadoop and we will how to use the Oracle R Connector for Hadoop and other Oracle Big Data Connectors to move data between Hadoop, R, and the Oracle Database easily.

    Read the article

  • Big GRC: Turning Data into Actionable GRC Intelligence

    - by Jenna Danko
    While it’s no longer headline news that Governments have carried out large scale data-mining programmes aimed at terrorism detection and identifying other patterns of interest across a wide range of digital data sources, the debate over the ethics and justification over this action, will clearly continue for some time to come. What is becoming clear is that these programmes are a framework for the collation and aggregation of massive amounts of unstructured data and from this, the creation of actionable intelligence from analyses that allowed the analysts to explore and extract a variety of patterns and then direct resources. This data included audio and video chats, phone calls, photographs, e-mails, documents, internet searches, social media posts and mobile phone logs and connections. Although Governance, Risk and Compliance (GRC) professionals are not looking at the implementation of such programmes, there are many similar GRC “Big data” challenges to be faced and potential lessons to be learned from these high profile government programmes that can be applied a lot closer to home. For example, how can GRC professionals collect, manage and analyze an enormous and disparate volume of data to create and manage their own actionable intelligence covering hidden signs and patterns of criminal activity, the early or retrospective, violation of regulations/laws/corporate policies and procedures, emerging risks and weakening controls etc. Not exactly the stuff of James Bond to be sure, but it is certainly more applicable to most GRC professional’s day to day challenges. So what is Big Data and how can it benefit the GRC process? Although it often varies, the definition of Big Data largely refers to the following types of data: Traditional Enterprise Data – includes customer information from CRM systems, transactional ERP data, web store transactions, and general ledger data. Machine-Generated /Sensor Data – includes Call Detail Records (“CDR”), weblogs and trading systems data. Social Data – includes customer feedback streams, micro-blogging sites like Twitter, and social media platforms like Facebook. The McKinsey Global Institute estimates that data volume is growing 40% per year, and will grow 44x between 2009 and 2020. But while it’s often the most visible parameter, volume of data is not the only characteristic that matters. In fact, according to sources such as Forrester there are four key characteristics that define big data: Volume. Machine-generated data is produced in much larger quantities than non-traditional data. This is all the data generated by IT systems that power the enterprise. This includes live data from packaged and custom applications – for example, app servers, Web servers, databases, networks, virtual machines, telecom equipment, and much more. Velocity. Social media data streams – while not as massive as machine-generated data – produce a large influx of opinions and relationships valuable to customer relationship management as well as offering early insight into potential reputational risk issues. Even at 140 characters per tweet, the high velocity (or frequency) of Twitter data ensures large volumes (over 8 TB per day) need to be managed. Variety. Traditional data formats tend to be relatively well defined by a data schema and change slowly. In contrast, non-traditional data formats exhibit a dizzying rate of change. Without question, all GRC professionals work in a dynamic environment and as new services, new products, new business lines are added or new marketing campaigns executed for example, new data types are needed to capture the resultant information.  Value. The economic value of data varies significantly. Typically, there is good information hidden amongst a larger body of non-traditional data that GRC professionals can use to add real value to the organisation; the greater challenge is identifying what is valuable and then transforming and extracting that data for analysis and action. For example, customer service calls and emails have millions of useful data points and have long been a source of information to GRC professionals. Those calls and emails are critical in helping GRC professionals better identify hidden patterns and implement new policies that can reduce the amount of customer complaints.   Now on a scale and depth far beyond those in place today, all that unstructured call and email data can be captured, stored and analyzed to reveal the reasons for the contact, perhaps with the aggregated customer results cross referenced against what is being said about the organization or a similar peer organization on social media. The organization can then take positive actions, communicating to the market in advance of issues reaching the press, strengthening controls, adjusting risk profiles, changing policy and procedures and completely minimizing, if not eliminating, complaints and compensation for that specific reason in the future. In this one example of many similar ones, the GRC team(s) has demonstrated real and tangible business value. Big Challenges - Big Opportunities As pointed out by recent Forrester research, high performing companies (those that are growing 15% or more year-on-year compared to their peers) are taking a selective approach to investing in Big Data.  "Tomorrow's winners understand this, and they are making selective investments aimed at specific opportunities with tangible benefits where big data offers a more economical solution to meet a need." (Forrsights Strategy Spotlight: Business Intelligence and Big Data, Q4 2012) As pointed out earlier, with the ever increasing volume of regulatory demands and fines for getting it wrong, limited resource availability and out of date or inadequate GRC systems all contributing to a higher cost of compliance and/or higher risk profile than desired – a big data investment in GRC clearly falls into this category. However, to make the most of big data organizations must evolve both their business and IT procedures, processes, people and infrastructures to handle these new high-volume, high-velocity, high-variety sources of data and be able integrate them with the pre-existing company data to be analyzed. GRC big data clearly allows the organization access to and management over a huge amount of often very sensitive information that although can help create a more risk intelligent organization, also presents numerous data governance challenges, including regulatory compliance and information security. In addition to client and regulatory demands over better information security and data protection the sheer amount of information organizations deal with the need to quickly access, classify, protect and manage that information can quickly become a key issue  from a legal, as well as technical or operational standpoint. However, by making information governance processes a bigger part of everyday operations, organizations can make sure data remains readily available and protected. The Right GRC & Big Data Partnership Becomes Key  The "getting it right first time" mantra used in so many companies remains essential for any GRC team that is sponsoring, helping kick start, or even overseeing a big data project. To make a big data GRC initiative work and get the desired value, partnerships with companies, who have a long history of success in delivering successful GRC solutions as well as being at the very forefront of technology innovation, becomes key. Clearly solutions can be built in-house more cheaply than through vendor, but as has been proven time and time again, when it comes to self built solutions covering AML and Fraud for example, few have able to scale or adapt appropriately to meet the changing regulations or challenges that the GRC teams face on a daily basis. This has led to the creation of GRC silo’s that are causing so many headaches today. The solutions that stand out and should be explored are the ones that can seamlessly merge the traditional world of well-known data, analytics and visualization with the new world of seemingly innumerable data sources, utilizing Big Data technologies to generate new GRC insights right across the enterprise.Ultimately, Big Data is here to stay, and organizations that embrace its potential and outline a viable strategy, as well as understand and build a solid analytical foundation, will be the ones that are well positioned to make the most of it. A Blueprint and Roadmap Service for Big Data Big data adoption is first and foremost a business decision. As such it is essential that your partner can align your strategies, goals, and objectives with an architecture vision and roadmap to accelerate adoption of big data for your environment, as well as establish practical, effective governance that will maintain a well managed environment going forward. Key Activities: While your initiatives will clearly vary, there are some generic starting points the team and organization will need to complete: Clearly define your drivers, strategies, goals, objectives and requirements as it relates to big data Conduct a big data readiness and Information Architecture maturity assessment Develop future state big data architecture, including views across all relevant architecture domains; business, applications, information, and technology Provide initial guidance on big data candidate selection for migrations or implementation Develop a strategic roadmap and implementation plan that reflects a prioritization of initiatives based on business impact and technology dependency, and an incremental integration approach for evolving your current state to the target future state in a manner that represents the least amount of risk and impact of change on the business Provide recommendations for practical, effective Data Governance, Data Quality Management, and Information Lifecycle Management to maintain a well-managed environment Conduct an executive workshop with recommendations and next steps There is little debate that managing risk and data are the two biggest obstacles encountered by financial institutions.  Big data is here to stay and risk management certainly is not going anywhere, and ultimately financial services industry organizations that embrace its potential and outline a viable strategy, as well as understand and build a solid analytical foundation, will be best positioned to make the most of it. Matthew Long is a Financial Crime Specialist for Oracle Financial Services. He can be reached at matthew.long AT oracle.com.

    Read the article

  • OK - What now? How do we become a Social Business?

    - by Michael Snow
    We hope that those of you that attended yesterday's Webcast with Brian Solis enjoyed Brian's discussion with Christian Finn for our last Webcast of the season for the Oracle Social Business Thought Leaders Series.  For those of you that may have missed the webcast or were stuck at a company holiday party - you'll be glad to hear that the webcast will be available On-Demand starting later today (12/14/12). And any of you who'd like to listen to a quick but informative podcast with Brian - can listen to that here. Some of you may still be left with questions about how to get from point A to point B and even more confused than when you started thinking about this new world of Digital Darwinism. The post below, grabbed from an abundance of great thought leadership prose on Brian's blog may help you frame the path you need to start walking sooner versus later to stay off of the endangered species list.  As you explore your path forward, please keep Oracle in mind - we do offer a wide range of solutions to help your organization 12.00 Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} optimize the engagement for your customers, employees and partners. The Path from a Social Brand to a Social Business Brian Solis Originally posted May 2, 2012 I’ve been a long-time supporter of MediaTemple’s (MT)Residence program along with Gary Vaynerchuk, Neil Patel, and many others whom I respect. I wanted to share my “7 questions to answer to become a social business” with you here.. Social Media is pervasive and is becoming the new normal in corporate marketing. Brands who get this right are starting to build their own media networks rich with customer connections numbering in the millions. Right now, Coca-Cola has over 34 million fans on Facebook, but they’re hardly alone. Disney follows just behind with 29 million fans, Starbucks boasts 25 million, and Oreo, Red Bull, and Converse play host to over 20 million fans. If we were to look at other networks such as Twitter and Youtube, we would see a recurring theme. People are connecting en masse with the businesses they support and new media represents the ability to cultivate consumer relationships in ways not possible with traditional earned or paid media. Sounds great right? This might sound abrupt, but the truth is that we’re hardly realizing the potential of what lies before us. Everything begins with understanding not just how other brands are marketing themselves in social media, but also seeing what they’re not doing and envisioning what’s possible. We’re already approaching the first of many crossroads that new media will present. Do we take the path of a social brand or that of a social business? What’s the difference? A social brand is just that, a business that is remodeling or retrofitting its existing marketing practices to new media. A social business is something altogether different as it embraces introspection and extrospection to reevaluate internal and external processes, systems, and opportunities to transform into a living, breathing entity that adapts to market conditions and opportunities. It’s a tough decision to make right now especially at a time when all we read about is how much success many businesses are finding without having to answer this very question. With all of the newfound success in social networks, the truth is that we’re only just beginning to learn what’s possible and that’s where you come in. When compared to the investment in time and resources across the board, social media represents only a small part of the mix. But with your help, that’s all about to change. The CMO Survey, an organization that disseminates the opinions of top marketers in order to predict the future of markets, recently published a report that gave credence to the fact that social media is taking off. One of the most profound takeaways from the report was this gem; “The “like button” [in Facebook] packs more customer-acquisition punch than other demand-generating activities.” With insights like this, it’s easy to see why the race to social is becoming heated. The report also highlighted exactly where social fits in the marketing mix today and as you can see, despite all of the hype, it’s not a dominant focus yet. As of August 2011, the percentage of overall marketing budgets dedicated to social media hovered at around 7%. However, in 2012 the investment in social media will climb to 10%. And, in five years, social media is expected to represent almost 18% of the total marketing budget. Think about that for a moment. In 2016, social media will only represent 18%? Queue the sound of a record scratching here. With businesses finding success in social networks, why are businesses failing to realize the true opportunity brought forth by the ability to listen to, connect with, and engage with customers? While there’s value in earning views, driving traffic, and building connections through the 3F’s (friends, fans and followers), success isn’t just defined simply by what really amounts to low-hanging fruit. The truth is that businesses cannot measure what it is they don’t know to value. As a result, innovation in new engagement initiatives is stifled because we’re applying dated or inflexible frameworks to new paradigms. Social media isn’t owned by marketing, but instead the entire organization. This changes everything and makes your role so much more important. It’s up to you to learn how to think outside of the proverbial social media box to see what others don’t, the ability to improve customers experiences through the evolution of a social brand into a social business. Doing so will translate customer insights from what they do and don’t share in social networks into better products, services, and processes. See, customers want something more from their favorite businesses than creative campaigns, viral content, and everyday dialogue in social networks. Customers want to be heard and they want to know that you’re listening. How businesses use social media must remind them that they’re more than just an audience, consumer, or a conduit to “trigger” a desired social effect. Herein lies both the challenge and opportunity of social media. It’s bigger than marketing. It’s also bigger than customer service. It’s about building relationships with customers that improve experiences and more importantly, teaches businesses how to re-imagine products and internal processes to better adapt to potential crises and seize new opportunities. When it comes down to it, Twitter, Facebook, Youtube, Foursquare, are all channels for listening, learning, and engaging. It’s what you do within each channel that builds a community around your brand. And, at the end of the day, the value of the community you build counts for everything. It’s important to understand that we cannot assume that these networks simply exist for people to lineup for our marketing messages or promotional campaigns. Nor can we assume that they’re reeling in anticipation for simple dialogue. They want value. They want recognition. They want access to exclusive information and offers. They need direction, answers and resolution. What we’re talking about here is the multidimensional makeup of consumers and how a one-sided approach to social media forces the needs for social media to expand beyond traditional marketing to socialize the various departments, lines of business, and functions to engage based on the nature of the situation or opportunity. In the same CMO study, it was revealed that marketers believe that social media has a long way to go toward integrating into the overall company strategy. On a scale of 1-7, with one being “not integrated at all” and seven being “very integrated,” 22% chose “one.” Critical functions such as service, HR, sales, R&D, product marketing and development, IR, CSR, etc. are either not engaged or are operating social media within a silo disconnected from other efforts or possibilities. The problem is that customers don’t view a company by silo, instead they see one company, one brand, and their experience in social media forms an impression that eventually contributes to their view of your brand. The first step here is to understand business priorities and objectives to assess how social media can be additive in achieving these goals. Additionally, surveying the landscape to determine other areas of interest as its specifically related to your business. • Are customers seeking help or direction? • Who are your most valuable customers and what are they sharing? • How can you use social media to acquire and retain customers? - What ideas are circulating and how can you harness user generated activity and content to innovate or adapt to better meet the needs of customers? - How can you broaden a single customer view to recognize the varying needs of customers and how your organization can organize around each circumstance? - What insights exist based on how consumers are interacting with one another? How can this intelligence inform marketing, service, products and other important business initiatives? - How can your business extend their current efforts to deliver better customer experiences and in turn more effectively unit internal collaboration and communication? Customer demands far exceed the capabilities of the marketing department. While creating a social brand is a necessary endeavor, building a social business is an investment in customer relevance now and over time. Beyond relevance, a social business fosters a culture of change that unites employees and customers and sets a foundation for meaningful and beneficial relationships. Innovation, communication, and creativity are the natural byproducts of engagement and transformation. As a social brand, we are competing for the moment. As a social business, we are competing the future in all that we do today.

    Read the article

  • Code golf: Word frequency chart

    - by ChristopheD
    The challenge: Build an ASCII chart of the most commonly used words in a given text. The rules: Only accept a-z and A-Z (alphabetic characters) as part of a word. Ignore casing (She == she for our purpose). Ignore the following words (quite arbitary, I know): the, and, of, to, a, i, it, in, or, is Clarification: considering don't: this would be taken as 2 different 'words' in the ranges a-z and A-Z: (don and t). Optionally (it's too late to be formally changing the specifications now) you may choose to drop all single-letter 'words' (this could potentially make for a shortening of the ignore list too). Parse a given text (read a file specified via command line arguments or piped in; presume us-ascii) and build us a word frequency chart with the following characteristics: Display the chart (also see the example below) for the 22 most common words (ordered by descending frequency). The bar width represents the number of occurences (frequency) of the word (proportionally). Append one space and print the word. Make sure these bars (plus space-word-space) always fit: bar + [space] + word + [space] should be always <= 80 characters (make sure you account for possible differing bar and word lenghts: e.g.: the second most common word could be a lot longer then the first while not differing so much in frequency). Maximize bar width within these constraints and scale the bars appropriately (according to the frequencies they represent). An example: The text for the example can be found here (Alice's Adventures in Wonderland, by Lewis Carroll). This specific text would yield the following chart: _________________________________________________________________________ |_________________________________________________________________________| she |_______________________________________________________________| you |____________________________________________________________| said |____________________________________________________| alice |______________________________________________| was |__________________________________________| that |___________________________________| as |_______________________________| her |____________________________| with |____________________________| at |___________________________| s |___________________________| t |_________________________| on |_________________________| all |______________________| this |______________________| for |______________________| had |_____________________| but |____________________| be |____________________| not |___________________| they |__________________| so For your information: these are the frequencies the above chart is built upon: [('she', 553), ('you', 481), ('said', 462), ('alice', 403), ('was', 358), ('that ', 330), ('as', 274), ('her', 248), ('with', 227), ('at', 227), ('s', 219), ('t' , 218), ('on', 204), ('all', 200), ('this', 181), ('for', 179), ('had', 178), (' but', 175), ('be', 167), ('not', 166), ('they', 155), ('so', 152)] A second example (to check if you implemented the complete spec): Replace every occurence of you in the linked Alice in Wonderland file with superlongstringstring: ________________________________________________________________ |________________________________________________________________| she |_______________________________________________________| superlongstringstring |_____________________________________________________| said |______________________________________________| alice |________________________________________| was |_____________________________________| that |______________________________| as |___________________________| her |_________________________| with |_________________________| at |________________________| s |________________________| t |______________________| on |_____________________| all |___________________| this |___________________| for |___________________| had |__________________| but |_________________| be |_________________| not |________________| they |________________| so The winner: Shortest solution (by character count, per language). Have fun! Edit: Table summarizing the results so far (2012-02-15) (originally added by user Nas Banov): Language Relaxed Strict ========= ======= ====== GolfScript 130 143 Perl 185 Windows PowerShell 148 199 Mathematica 199 Ruby 185 205 Unix Toolchain 194 228 Python 183 243 Clojure 282 Scala 311 Haskell 333 Awk 336 R 298 Javascript 304 354 Groovy 321 Matlab 404 C# 422 Smalltalk 386 PHP 450 F# 452 TSQL 483 507 The numbers represent the length of the shortest solution in a specific language. "Strict" refers to a solution that implements the spec completely (draws |____| bars, closes the first bar on top with a ____ line, accounts for the possibility of long words with high frequency etc). "Relaxed" means some liberties were taken to shorten to solution. Only solutions shorter then 500 characters are included. The list of languages is sorted by the length of the 'strict' solution. 'Unix Toolchain' is used to signify various solutions that use traditional *nix shell plus a mix of tools (like grep, tr, sort, uniq, head, perl, awk).

    Read the article

  • Skanska Builds Global Workforce Insight with Cloud-Based HCM System

    - by HCM-Oracle
    By David Baum - Originally posted on Profit Peter Bjork grew up building things. He started his work life learning all sorts of trades at his father’s construction company in the northern part of Sweden. So in college, it was natural for him to pursue a bachelor’s degree in construction engineering—but he broke new ground when he added a master’s degree in finance to his curriculum vitae. Written on a traditional résumé, Bjork’s current title (vice president of information systems strategies) doesn’t reveal the diversity of his experience—that he’s adept with hammer and nails as well as rows and columns. But a big part of his current job is to work with his counterparts in human resources (HR) designing, building, and deploying the systems needed to get a complete view of the skills and potential of Skanska’s 22,000-strong white-collar workforce. And Bjork believes that complete view is essential to Skanska’s success. “Our business is really all about people,” says Bjork, who has worked with Skanska for 16 years. “You can have equipment and financial resources, but to truly succeed in a business like ours you need to have the right people in the right places. That’s what this system is helping us accomplish.” In a global HR environment that suffers from a paradox of high unemployment and a scarcity of skilled labor, managers need to have a complete understanding of workforce capabilities to develop management skills, recruit for open positions, ensure that staff is getting the training they need, and reduce attrition. Skanska’s human capital management (HCM) systems, based on Oracle Talent Management Cloud, play a critical role delivering that understanding. “Skanska’s philosophy of having great people, encouraging their development, and giving them the chance to move across business units has nurtured a culture of collaboration, but managing a diverse workforce spread across the globe is a monumental challenge,” says Annika Lindholm, global human resources system owner in the HR department at Skanska’s headquarters just outside of Stockholm, Sweden. “We depend heavily on Oracle’s cloud technology to support our HCM function.” Construction, Workers For Skanska’s more than 60,000 employees and contractors, managing huge construction projects is an everyday job. Beyond erecting signature buildings, management’s goal is to build a corporate culture where valuable talent can be sought out and developed, bringing in the right mix of people to support and grow the business. “Of all the companies in our space, Skanska is probably one of the strongest ones, with a laser focus on people and people development,” notes Tom Crane, chief HR and communications officer for Skanska in the United States. “Our business looks like equipment and material, but all we really have at the end of the day are people and their intellectual capital. Without them, second only to clients, of course, you really can’t achieve great things in the high-profile environment in which we work.” During the 1990s, Skanska entered an expansive growth phase. A string of successful acquisitions paved the way for the company’s transformation into a global enterprise. “Today the company’s focus is on profitable growth,” continues Crane. “But you can’t really achieve growth unless you are doing a very good job of developing your people and having the right people in the right places and driving a culture of growth.” In the United States alone, Skanska has more than 8,000 employees in four distinct business units: Skanska USA Building, also known as the Construction Manager, builds everything at ground level and above—hospitals, educational facilities, stadiums, airport terminals, and other massive projects. Skanska USA Civil does everything at ground level and below, such as light rail, water treatment facilities, power plants or power industry facilities, highways, and bridges. Skanska Infrastructure Development develops public-private partnerships—projects in which Skanska adds equity and also arranges for outside financing. Skanska Commercial Development acts like a commercial real estate developer, acquiring land and building offices on spec or build-to-suit for its clients. Skanska's international portfolio includes construction of the new Meadowlands Stadium. Getting the various units to operate collaboratatively helps Skanska deliver high value to clients and shareholders. “When we have this collaboration among units, it allows us to enrich each of the business units and, at the same time, develop our future leaders to be more facile in operating across business units—more accepting of a ‘one Skanska’ approach,” explains Crane. Workforce Worldwide But HR needs processes and tools to support managers who face such business dynamics. Oracle Talent Management Cloud is helping Skanska implement world-class recruiting strategies and generate the insights needed to drive quality hiring practices, internal mobility, and a proactive approach to building talent pipelines. With their new cloud system in place, Skanska HR leaders can manage everything from recruiting, compensation, and goal and performance management to employee learning and talent review—all as part of a single, cohesive software-as-a-service (SaaS) environment. Skanska has successfully implemented two modules from Oracle Talent Management Cloud—the recruiting and performance management modules—and is in the process of implementing the learn module. Internally, they call the systems Skanska Recruit, Skanska Talent, and Skanska Learn. The timing is apropos. With high rates of unemployment in recent years, there have been many job candidates on the market. However, talent scarcity continues to frustrate recruiters. Oracle Taleo Recruiting Cloud Service, one of the applications in the Oracle Talent Management cloud portfolio, enables Skanska managers to create more-intelligent recruiting strategies, pulling high-performer profile statistics to create new candidate profiles and using multitiered screening and assessments to ensure that only the best-suited candidate applications make it to the recruiter’s desk. Tools such as applicant tracking, interview management, and requisition management help recruiters and hiring managers streamline the hiring process. Oracle’s cloud-based software system automates and streamlines many other HR processes for Skanska’s multinational organization and delivers insight into the success of recruiting and talent-management efforts. “The Oracle system is definitely helping us to construct global HR processes,” adds Bjork. “It is really important that we have a business model that is decentralized, so we can effectively serve our local markets, and interact with our global ERP [enterprise resource planning] systems as well. We would not be able to do this without a really good, well-integrated HCM system that could support these efforts.” A key piece of this effort is something Skanska has developed internally called the Skanska Leadership Profile. Core competencies, on which all employees are measured, are used in performance reviews to determine weak areas but also to discover talent, such as those who will be promoted or need succession plans. This global profiling system brings consistency to the way HR professionals evaluate and review talent across the company, with a consistent set of ratings and a consistent definition of competencies. All salaried employees in Skanska are tied to a talent management process that gives opportunity for midyear and year-end reviews. Using the performance management module, managers can align individual goals with corporate goals; provide clear visibility into how each employee contributes to the success of the organization; and drive a strategic, end-to-end talent management strategy with a single, integrated system for all talent-related activities. This is critical to a company that is highly focused on ensuring that every employee has a development plan linked to his or her succession potential. “Our approach all along has been to deploy software applications that are seamless to end users,” says Crane. “The beauty of a cloud-based system is that much of the functionality takes place behind the scenes so we can focus on making sure users can access the data when they need it. This model greatly improves their efficiency.” The employee profile not only sets a competency baseline for new employees but is also integrated with Skanska’s other back-office Oracle systems to ensure consistency in the way information is used to support other business functions. “Since we have about a dozen different HR systems that are providing us with information, we built a master database that collects all the information,” explains Lindholm. “That data is sent not only to Oracle Talent Management Cloud, but also to other systems that are dependent on this information.” Collaboration to Scale Skanska is poised to launch a new Oracle module to link employee learning plans to the review process and recruitment assessments. According to Crane, connecting these processes allows Skanska managers to see employees’ progress and produce an updated learning program. For example, as employees take classes, supervisors can consult the Oracle Talent Management Cloud portal to monitor progress and align it to each individual’s training and development plan. “That’s a pretty compelling solution for an organization that wants to manage its talent on a real-time basis and see how the training is working,” Crane says. Rolling out Oracle Talent Management Cloud was a joint effort among HR, IT, and a global group that oversaw the worldwide implementation. Skanska deployed the solution quickly across all markets at once. In the United States, for example, more than 35 offices quickly got up to speed on the new system via webinars for employees and face-to-face training for the HR group. “With any migration, there are moments when you hold your breath, but in this case, we had very few problems getting the system up and running,” says Crane. Lindholm adds, “There has been very little resistance to the system as users recognize its potential. Customizations are easy, and a lasting partnership has developed between Skanska and Oracle when help is needed. They listen to us.” Bjork elaborates on the implementation process from an IT perspective. “Deploying a SaaS system removes a lot of the complexity,” he says. “You can downsize the IT part and focus on the business part, which increases the probability of a successful implementation. If you want to scale the system, you make a quick phone call. That’s all it took recently when we added 4,000 users. We didn’t have to think about resizing the servers or hiring more IT people. Oracle does that for us, and they have provided very good support.” As a result, Skanska has been able to implement a single, cost-effective talent management solution across the organization to support its strategy to recruit and develop a world-class staff. Stakeholders are confident that they are providing the most efficient recruitment system possible for competent personnel at all levels within the company—from skilled workers at construction sites to top management at headquarters. And Skanska can retain skilled employees and ensure that they receive the development opportunities they need to grow and advance.

    Read the article

  • Solving Big Problems with Oracle R Enterprise, Part I

    - by dbayard
    Abstract: This blog post will show how we used Oracle R Enterprise to tackle a customer’s big calculation problem across a big data set. Overview: Databases are great for managing large amounts of data in a central place with rigorous enterprise-level controls.  R is great for doing advanced computations.  Sometimes you need to do advanced computations on large amounts of data, subject to rigorous enterprise-level concerns.  This blog post shows how Oracle R Enterprise enables R plus the Oracle Database enabled us to do some pretty sophisticated calculations across 1 million accounts (each with many detailed records) in minutes. The problem: A financial services customer of mine has a need to calculate the historical internal rate of return (IRR) for its customers’ portfolios.  This information is needed for customer statements and the online web application.  In the past, they had solved this with a home-grown application that pulled trade and account data out of their data warehouse and ran the calculations.  But this home-grown application was not able to do this fast enough, plus it was a challenge for them to write and maintain the code that did the IRR calculation. IRR – a problem that R is good at solving: Internal Rate of Return is an interesting calculation in that in most real-world scenarios it is impractical to calculate exactly.  Rather, IRR is a calculation where approximation techniques need to be used.  In this blog post, we will discuss calculating the “money weighted rate of return” but in the actual customer proof of concept we used R to calculate both money weighted rate of returns and time weighted rate of returns.  You can learn more about the money weighted rate of returns here: http://www.wikinvest.com/wiki/Money-weighted_return First Steps- Calculating IRR in R We will start with calculating the IRR in standalone/desktop R.  In our second post, we will show how to take this desktop R function, deploy it to an Oracle Database, and make it work at real-world scale.  The first step we did was to get some sample data.  For a historical IRR calculation, you have a balances and cash flows.  In our case, the customer provided us with several accounts worth of sample data in Microsoft Excel.      The above figure shows part of the spreadsheet of sample data.  The data provides balances and cash flows for a sample account (BMV=beginning market value. FLOW=cash flow in/out of account. EMV=ending market value). Once we had the sample spreadsheet, the next step we did was to read the Excel data into R.  This is something that R does well.  R offers multiple ways to work with spreadsheet data.  For instance, one could save the spreadsheet as a .csv file.  In our case, the customer provided a spreadsheet file containing multiple sheets where each sheet provided data for a different sample account.  To handle this easily, we took advantage of the RODBC package which allowed us to read the Excel data sheet-by-sheet without having to create individual .csv files.  We wrote ourselves a little helper function called getsheet() around the RODBC package.  Then we loaded all of the sample accounts into a data.frame called SimpleMWRRData. Writing the IRR function At this point, it was time to write the money weighted rate of return (MWRR) function itself.  The definition of MWRR is easily found on the internet or if you are old school you can look in an investment performance text book.  In the customer proof, we based our calculations off the ones defined in the The Handbook of Investment Performance: A User’s Guide by David Spaulding since this is the reference book used by the customer.  (One of the nice things we found during the course of this proof-of-concept is that by using R to write our IRR functions we could easily incorporate the specific variations and business rules of the customer into the calculation.) The key thing with calculating IRR is the need to solve a complex equation with a numerical approximation technique.  For IRR, you need to find the value of the rate of return (r) that sets the Net Present Value of all the flows in and out of the account to zero.  With R, we solve this by defining our NPV function: where bmv is the beginning market value, cf is a vector of cash flows, t is a vector of time (relative to the beginning), emv is the ending market value, and tend is the ending time. Since solving for r is a one-dimensional optimization problem, we decided to take advantage of R’s optimize method (http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optimize.html). The optimize method can be used to find a minimum or maximum; to find the value of r where our npv function is closest to zero, we wrapped our npv function inside the abs function and asked optimize to find the minimum.  Here is an example of using optimize: where low and high are scalars that indicate the range to search for an answer.   To test this out, we need to set values for bmv, cf, t, emv, tend, low, and high.  We will set low and high to some reasonable defaults. For example, this account had a negative 2.2% money weighted rate of return. Enhancing and Packaging the IRR function With numerical approximation methods like optimize, sometimes you will not be able to find an answer with your initial set of inputs.  To account for this, our approach was to first try to find an answer for r within a narrow range, then if we did not find an answer, try calling optimize() again with a broader range.  See the R help page on optimize()  for more details about the search range and its algorithm. At this point, we can now write a simplified version of our MWRR function.  (Our real-world version is  more sophisticated in that it calculates rate of returns for 5 different time periods [since inception, last quarter, year-to-date, last year, year before last year] in a single invocation.  In our actual customer proof, we also defined time-weighted rate of return calculations.  The beauty of R is that it was very easy to add these enhancements and additional calculations to our IRR package.)To simplify code deployment, we then created a new package of our IRR functions and sample data.  For this blog post, we only need to include our SimpleMWRR function and our SimpleMWRRData sample data.  We created the shell of the package by calling: To turn this package skeleton into something usable, at a minimum you need to edit the SimpleMWRR.Rd and SimpleMWRRData.Rd files in the \man subdirectory.  In those files, you need to at least provide a value for the “title” section. Once that is done, you can change directory to the IRR directory and type at the command-line: The myIRR package for this blog post (which has both SimpleMWRR source and SimpleMWRRData sample data) is downloadable from here: myIRR package Testing the myIRR package Here is an example of testing our IRR function once it was converted to an installable package: Calculating IRR for All the Accounts So far, we have shown how to calculate IRR for a single account.  The real-world issue is how do you calculate IRR for all of the accounts?This is the kind of situation where we can leverage the “Split-Apply-Combine” approach (see http://www.cscs.umich.edu/~crshalizi/weblog/815.html).  Given that our sample data can fit in memory, one easy approach is to use R’s “by” function.  (Other approaches to Split-Apply-Combine such as plyr can also be used.  See http://4dpiecharts.com/2011/12/16/a-quick-primer-on-split-apply-combine-problems/). Here is an example showing the use of “by” to calculate the money weighted rate of return for each account in our sample data set.  Recap and Next Steps At this point, you’ve seen the power of R being used to calculate IRR.  There were several good things: R could easily work with the spreadsheets of sample data we were given R’s optimize() function provided a nice way to solve for IRR- it was both fast and allowed us to avoid having to code our own iterative approximation algorithm R was a convenient language to express the customer-specific variations, business-rules, and exceptions that often occur in real-world calculations- these could be easily added to our IRR functions The Split-Apply-Combine technique can be used to perform calculations of IRR for multiple accounts at once. However, there are several challenges yet to be conquered at this point in our story: The actual data that needs to be used lives in a database, not in a spreadsheet The actual data is much, much bigger- too big to fit into the normal R memory space and too big to want to move across the network The overall process needs to run fast- much faster than a single processor The actual data needs to be kept secured- another reason to not want to move it from the database and across the network And the process of calculating the IRR needs to be integrated together with other database ETL activities, so that IRR’s can be calculated as part of the data warehouse refresh processes In our next blog post in this series, we will show you how Oracle R Enterprise solved these challenges.

    Read the article

  • Contract Work - Lessons Learned

    - by samerpaul
    I thought I would write a post of a different nature today, but still relevant to the tech world. I do a lot of contract jobs myself and really enjoy it. It's nice to keep jumping from project to project, and not having to go to an office or keep regular hours, etc. I really enjoy it. I have learned a lot in the past few years of doing it (both from experience and from help given to me from others, and the internet) so I thought I'd share some of that knowledge/experience today.So here's my own personal "lesson's learned" that hopefully will help you if you find yourself doing contract work:Should I take the job?Ok, so this is the first step. Assuming you were given sufficient information about what they want, then you should really think about what you're capable of doing and whether or not you should take this job. Personally, my rule is, if I know it's possible, I'll say yes, even if I don't yet know how to do it. That's because the internet is such a great help, it would be rare to run into an issue that you can't figure out with some help. So if your clients are asking for something that you don't yet know how to program, but you know you can do it on the platform then go for it. How else are you going to learn?Use this rule with some limitation, however. If you're really lacking the expertise or foundation in something, then unless you have tons of time to complete the project, then I wouldn't say yes. For example, I haven't personally done any 3d/openGL programming yet so I wouldn't say yes to a project that extensively uses it. OK, so I want the job, but how much do I charge?This part can be tricky. There is no set formula really, but I have some tips for pricing that will hopefully give you a better idea on how to confidently ask your price and have them accept. Here are some personal guidelinesHow much time do you have to complete the project? If it's shorter than average, then charge more. You can even make a subtle note about this (or not so subtle if they still don't get it.) If it seems too short of a time (i.e. near impossible to complete), be sure to say that. It looks bad to promise a time that you can't keep--and it makes it less likely for them to return to you for work.Your Hourly rate: How long have you been working in that language? Do you have existing projects to back you up? Or previous contacts that can vouch for your work? Are there very few people with your particular skill set? All of these things will lend themselves to setting an hourly rate. I'd also try out a quick google search of what your line of work is, to see what the industry standard is at that point in time.I wouldn't price too low, because you want to make your time worth it. You also want them to feel like they're paying for quality work (assuming you can deliver it :) ). Finally, think about your client. If it's a small business, then don't price it too high if you want the job. If it's an enterprise (like a Fortune company), then don't be afraid to price higher. They have the budget for it.Fixed price: If they want a fixed price project, then you need to think about how many hours it will take you to complete it and multiply it by the hourly rate you set for yourself. Then, honestly, I would add 10-20% on top of that. Why? Because nothing ever works exactly how you want it to. There are lots of times that something "trivial" is way harder than it should be, or something that "should work" doesn't for hours and it eats away at your hourly rate. I can't count the number of times I encountered a logical bug that took away an entire's day work because debuggers don't help in those cases. By adding that padding in, it's still OK to have those days where you don't get as much done as you want. And another useful tip: Depending on your client, and the scope, you most likely want to set that you both sign off on a specification sheet before doing any work, and that any changes will result in a re-evaulation of the price. This is to help protect you from being handed a huge new addition to the project half-way in, without any extra payment.Scope of project: Finally, is it a huge project? Is it really small/fast? This affects how much your client will be willing to pay. If it sounds big, they will be willing to pay more for it. If it seems really small, then you won't be able to get away with a large asking price (as easily).Ok, I priced it, now what?So now that you have the price, you want to make sure it feels justified to your client. I never set a price before I can really think about everything. For example, if you're still in your introduction phase, and they want a price, don't give one! Just comment that you will send them a proposal sheet with all the features outlined, and a price for everything. You don't want to shout out a low number and then deliver something that is way higher. You also don't want to shock them with a big number before they feel like they are getting a great product.Make up a proposal document in a word editor. Personally, I leave the price till the very end. Why? Because by the time they reach the end, you've already discussed all the great features you plan to implement, and how it's the best product they'll ever use, etc etc...so your price comes off as a steal! If you hit them up front with a price, they will read through the document with a negative bias. Think about those commercials on TV. They always go on about their product, then at the end, ask "What would you pay for something like this? $100? $50? How about $20!!". This is not by accident.Scenario: I finished the job way earlier than expectedYou have two options then. You can either polish the hell out of the application, and even throw in a few bonus features (assuming they are in-line with the customer's needs) or you can sit and wait on it until you near your deadline. Why don't you want to turn it in too early? Because you should treat that extra time as a surplus. If you said it is going to take you 3 weeks, and it took you only 1, you have a surplus of 2 weeks. I personally don't want to let them know that I can do a 3 week project in 1 week. Why not? Because that may not always be the case! I may later have a 3 week project that takes all 3 weeks, but if I set a precedent of delivering super early, then the pressure is on for that longer project. It also makes it harder to quote longer times if you keep delivering too early.Feel free to deliver early, but again, don't do it too early. They may also wonder why they paid you for 3 weeks of work if you're done in 1. They may further wonder if the product sucks, or what is wrong with it, if it's done so early, etc.I would just polish the application. Everyone loves polish in their applications. The smallest details are what make an application go from "functional" to "fantastic". And since you are still delivering on time, then they are still going to be very happy with you.Scenario: It's taking way too long to finish this, and the deadline is nearing/here!So this is not a fun scenario to be in, but it'll happen. Sometimes the scope of the project gets out of hand. The best policy here is OPENNESS/HONESTY. Tell them that the project is taking longer than expected, and give a reasonable time for when you think you'll have it done. I typically explain it in a way that makes it sound like it isn't something that I did wrong, but it's just something about the nature of the project. This really goes for any scenario, to be honest. Just continue to stay open and communicative about your progress. This doesn't mean that you should email them every five minutes (unless they want you to), but it does mean that maybe every few days or once a week, give them an update on where you're at, and what's next. They'll be happy to know they are paying for progress, and it'll make it easier to ask for an extension when something goes wrong, because they know that you've been working on it all along.Final tips and thoughts:In general, contract work is really fun and rewarding. It's nice to learn new things all the time, as mandated by the project ,and to challenge yourself to do things you may not have done before. The key is to build a great relationship with your clients for future work, and for recommendations. I am always very honest with them and I never promise something I can't deliver. Again, under promise, over deliver!I hope this has proved helpful!Cheers,samerpaul

    Read the article

  • Where do I start ?

    - by Panthe
    Brief History: Just graduated high school, learned a bit of python and C++, have no friends with any helpful computer knowledge at all. Out of anyone i met in my school years I was probably the biggest nerd, but no one really knew. I consider my self to have a vast amount of knowledge on computers and tech then the average person. built/fixed tons of computers, and ability to troubleshoot pretty much any problem I came across. Now that high school is over, Ive really been thinking about my career. Loving, living computers for the past 15 years of my life I decided to take my ability's and try to learn computer programming, why I didn't start earlier I don't know, seems to be big mistake on my part... Doing some research I concluded that Python was the first programming language I should learn, since it was high level and easier to understand then C++ and Java. I also knew that to become good at what I did I needed to know more then just 2 or 3 languages, which didn't seem like a big problem considering once I learned the way Python worked, mainly syntax changed, and the rest would come naturally. I watched a couple of youtube videos, downloaded some book pdf's and snooped around from some tutorials here and there to get the hang of what to do. A two solid weeks had passed of trying to understand the syntax, create small programs that used the basic functions and understanding how it worked, I think i have got the hang of it. It breaks down into what ive been dealing with all this time (although i kinda knew) is that, input,output, loops, functions and other things derived from 0's and 1's storing data and recalling it, ect. (A VERY BASIC IDEA). Ive been able to create small programs, Hangman, file storing, temperature conversion, Caeser Cipher decode/encoding, Fibonacci Sequence and more, which i can create and understand how each work. Being 2 weeks into this, I have learned alot. Nothing at all compared to what i should be learning in the years to come if i get a grip on what I'm doing. While doing these programs I wont stop untill I've done doing a practice problem on a book, which embarresing enough will take me a couple hour depending on the complexity of it. I absolutly will not put aside the challenge until its complete, WHICH CAN BE EXTREMELY DRAINING, ive tried most problems without cheating and reached success, which makes me feel extremely proud of my self after completing something after much trial and error. After all this I have met the demon, alogrithm's which seem to be key to effiecent code. I cant seem to rap my head around some of the computer codes people put out there using numbers, and sometimes even basic functions, I have been able to understand them after a while but i know there are alot more complex things to come, considering my self smart, functions that require complex codes, actually hurt my brain. NOTHING EVER IN LIFE HURT MY BRAIN....... not even math classes in highschool, trying to understand some of the stuff people put out there makes me feel like i have a mental disadvantage lol... i still walk forward though, crossing my fingers that the understanding will come with time. Sorry if is this is long i just wish someone takes all these things into consideration when answering my question. even through all these downsides im still pushing through and continuing to try and get good at this, i know reading these tutorials wont make me any good unless i can become creative and make my own, understand other peoples programs, so this leads me to the simple question i could have asked in the beginning..... WHERE IN THE WORLD DO I START ? Ive been trying to find out how to understand some of the open source projects, how i can work with experianced coders to learn from them and help them, but i dont think thats even possible by the way how far people's knowledge is compared to me, i have no freinds who i can learn from, can someone help me and guide me into the right direction.. i have a huge motivation to get good at coding, anything information would be extremely helpful

    Read the article

  • Of transactions and Mongo

    - by Nuri Halperin
    Originally posted on: http://geekswithblogs.net/nuri/archive/2014/05/20/of-transactions-and-mongo-again.aspxWhat's the first thing you hear about NoSQL databases? That they lose your data? That there's no transactions? No joins? No hope for "real" applications? Well, you *should* be wondering whether a certain of database is the right one for your job. But if you do so, you should be wondering that about "traditional" databases as well! In the spirit of exploration let's take a look at a common challenge: You are a bank. You have customers with accounts. Customer A wants to pay B. You want to allow that only if A can cover the amount being transferred. Let's looks at the problem without any context of any database engine in mind. What would you do? How would you ensure that the amount transfer is done "properly"? Would you prevent a "transaction" from taking place unless A can cover the amount? There are several options: Prevent any change to A's account while the transfer is taking place. That boils down to locking. Apply the change, and allow A's balance to go below zero. Charge person A some interest on the negative balance. Not friendly, but certainly a choice. Don't do either. Options 1 and 2 are difficult to attain in the NoSQL world. Mongo won't save you headaches here either. Option 3 looks a bit harsh. But here's where this can go: ledger. See, and account doesn't need to be represented by a single row in a table of all accounts with only the current balance on it. More often than not, accounting systems use ledgers. And entries in ledgers - as it turns out – don't actually get updated. Once a ledger entry is written, it is not removed or altered. A transaction is represented by an entry in the ledger stating and amount withdrawn from A's account and an entry in the ledger stating an addition of said amount to B's account. For sake of space-saving, that entry in the ledger can happen using one entry. Think {Timestamp, FromAccountId, ToAccountId, Amount}. The implication of the original question – "how do you enforce non-negative balance rule" then boils down to: Insert entry in ledger Run validation of recent entries Insert reverse entry to roll back transaction if validation failed. What is validation? Sum up the transactions that A's account has (all deposits and debits), and ensure the balance is positive. For sake of efficiency, one can roll up transactions and "close the book" on transactions with a pseudo entry stating balance as of midnight or something. This lets you avoid doing math on the fly on too many transactions. You simply run from the latest "approved balance" marker to date. But that's an optimization, and premature optimizations are the root of (some? most?) evil.. Back to some nagging questions though: "But mongo is only eventually consistent!" Well, yes, kind of. It's not actually true that Mongo has not transactions. It would be more descriptive to say that Mongo's transaction scope is a single document in a single collection. A write to a Mongo document happens completely or not at all. So although it is true that you can't update more than one documents "at the same time" under a "transaction" umbrella as an atomic update, it is NOT true that there' is no isolation. So a competition between two concurrent updates is completely coherent and the writes will be serialized. They will not scribble on the same document at the same time. In our case - in choosing a ledger approach - we're not even trying to "update" a document, we're simply adding a document to a collection. So there goes the "no transaction" issue. Now let's turn our attention to consistency. What you should know about mongo is that at any given moment, only on member of a replica set is writable. This means that the writable instance in a set of replicated instances always has "the truth". There could be a replication lag such that a reader going to one of the replicas still sees "old" state of a collection or document. But in our ledger case, things fall nicely into place: Run your validation against the writable instance. It is guaranteed to have a ledger either with (after) or without (before) the ledger entry got written. No funky states. Again, the ledger writing *adds* a document, so there's no inconsistent document state to be had either way. Next, we might worry about data loss. Here, mongo offers several write-concerns. Write-concern in Mongo is a mode that marshals how uptight you want the db engine to be about actually persisting a document write to disk before it reports to the application that it is "done". The most volatile, is to say you don't care. In that case, mongo would just accept your write command and say back "thanks" with no guarantee of persistence. If the server loses power at the wrong moment, it may have said "ok" but actually no written the data to disk. That's kind of bad. Don't do that with data you care about. It may be good for votes on a pole regarding how cute a furry animal is, but not so good for business. There are several other write-concerns varying from flushing the write to the disk of the writable instance, flushing to disk on several members of the replica set, a majority of the replica set or all of the members of a replica set. The former choice is the quickest, as no network coordination is required besides the main writable instance. The others impose extra network and time cost. Depending on your tolerance for latency and read-lag, you will face a choice of what works for you. It's really important to understand that no data loss occurs once a document is flushed to an instance. The record is on disk at that point. From that point on, backup strategies and disaster recovery are your worry, not loss of power to the writable machine. This scenario is not different from a relational database at that point. Where does this leave us? Oh, yes. Eventual consistency. By now, we ensured that the "source of truth" instance has the correct data, persisted and coherent. But because of lag, the app may have gone to the writable instance, performed the update and then gone to a replica and looked at the ledger there before the transaction replicated. Here are 2 options to deal with this. Similar to write concerns, mongo support read preferences. An app may choose to read only from the writable instance. This is not an awesome choice to make for every ready, because it just burdens the one instance, and doesn't make use of the other read-only servers. But this choice can be made on a query by query basis. So for the app that our person A is using, we can have person A issue the transfer command to B, and then if that same app is going to immediately as "are we there yet?" we'll query that same writable instance. But B and anyone else in the world can just chill and read from the read-only instance. They have no basis to expect that the ledger has just been written to. So as far as they know, the transaction hasn't happened until they see it appear later. We can further relax the demand by creating application UI that reacts to a write command with "thank you, we will post it shortly" instead of "thank you, we just did everything and here's the new balance". This is a very powerful thing. UI design for highly scalable systems can't insist that the all databases be locked just to paint an "all done" on screen. People understand. They were trained by many online businesses already that your placing of an order does not mean that your product is already outside your door waiting (yes, I know, large retailers are working on it... but were' not there yet). The second thing we can do, is add some artificial delay to a transaction's visibility on the ledger. The way that works is simply adding some logic such that the query against the ledger never nets a transaction for customers newer than say 15 minutes and who's validation flag is not set. This buys us time 2 ways: Replication can catch up to all instances by then, and validation rules can run and determine if this transaction should be "negated" with a compensating transaction. In case we do need to "roll back" the transaction, the backend system can place the timestamp of the compensating transaction at the exact same time or 1ms after the original one. Effectively, once A or B visits their ledger, both transactions would be visible and the overall balance "as of now" would reflect no change.  The 2 transactions (attempted/ reverted) would be visible , since we do actually account for the attempt. Hold on a second. There's a hole in the story: what if several transfers from A to some accounts are registered, and 2 independent validators attempt to compute the balance concurrently? Is there a chance that both would conclude non-sufficient-funds even though rolling back transaction 100 would free up enough for transaction 117 (some random later transaction)? Yes. there is that chance. But the integrity of the business rule is not compromised, since the prime rule is don't dispense money you don't have. To minimize or eliminate this scenario, we can also assign a single validation process per origin account. This may seem non-scalable, but it can easily be done as a "sharded" distribution. Say we have 11 validation threads (or processing nodes etc.). We divide the account number space such that each validator is exclusively responsible for a certain range of account numbers. Sounds cunningly similar to Mongo's sharding strategy, doesn't it? Each validator then works in isolation. More capacity needed? Chop the account space into more chunks. So where  are we now with the nagging questions? "No joins": Huh? What are those for? "No transactions": You mean no cross-collection and no cross-document transactions? Granted - but don't always need them either. "No hope for real applications": well... There are more issues and edge cases to slog through, I'm sure. But hopefully this gives you some ideas of how to solve common problems without distributed locking and relational databases. But then again, you can choose relational databases if they suit your problem.

    Read the article

< Previous Page | 48 49 50 51 52 53 54  | Next Page >