Search Results

Search found 22961 results on 919 pages for 'memory management'.

Page 294/919 | < Previous Page | 290 291 292 293 294 295 296 297 298 299 300 301  | Next Page >

  • OWB 11gR2 - Early Arriving Facts

    - by Dawei Sun
    A common challenge when building ETL components for a data warehouse is how to handle early arriving facts. OWB 11gR2 introduced a new feature to address this for dimensional objects entitled Orphan Management. An orphan record is one that does not have a corresponding existing parent record. Orphan management automates the process of handling source rows that do not meet the requirements necessary to form a valid dimension or cube record. In this article, a simple example will be provided to show you how to use Orphan Management in OWB. We first import a sample MDL file that contains all the objects we need. Then we take some time to examine all the objects. After that, we prepare the source data, deploy the target table and dimension/cube loading map. Finally, we run the loading maps, and check the data in target dimension/cube tables. OK, let’s start… 1. Import MDL file and examine sample project First, download zip file from here, which includes a MDL file and three source data files. Then we open OWB design center, import orphan_management.mdl by using the menu File->Import->Warehouse Builder Metadata. Now we have several objects in BI_DEMO project as below: Mapping LOAD_CHANNELS_OM: The mapping for dimension loading. Mapping LOAD_SALES_OM: The mapping for cube loading. Dimension CHANNELS_OM: The dimension that contains channels data. Cube SALES_OM: The cube that contains sales data. Table CHANNELS_OM: The star implementation table of dimension CHANNELS_OM. Table SALES_OM: The star implementation table of cube SALES_OM. Table SRC_CHANNELS: The source table of channels data, that will be loaded into dimension CHANNELS_OM. Table SRC_ORDERS and SRC_ORDER_ITEMS: The source tables of sales data that will be loaded into cube SALES_OM. Sequence CLASS_OM_DIM_SEQ: The sequence used for loading dimension CHANNELS_OM. Dimension CHANNELS_OM This dimension has a hierarchy with three levels: TOTAL, CLASS and CHANNEL. Each level has three attributes: ID (surrogate key), NAME and SOURCE_ID (business key). It has a standard star implementation. The orphan management policy and the default parent setting are shown in the following screenshots: The orphan management policy options that you can set for loading are: Reject Orphan: The record is not inserted. Default Parent: You can specify a default parent record. This default record is used as the parent record for any record that does not have an existing parent record. If the default parent record does not exist, Warehouse Builder creates the default parent record. You specify the attribute values of the default parent record at the time of defining the dimensional object. If any ancestor of the default parent does not exist, Warehouse Builder also creates this record. No Maintenance: This is the default behavior. Warehouse Builder does not actively detect, reject, or fix orphan records. While removing data from a dimension, you can select one of the following orphan management policies: Reject Removal: Warehouse Builder does not allow you to delete the record if it has existing child records. No Maintenance: This is the default behavior. Warehouse Builder does not actively detect, reject, or fix orphan records. (More details are at http://download.oracle.com/docs/cd/E11882_01/owb.112/e10935/dim_objects.htm#insertedID1) Cube SALES_OM This cube is references to dimension CHANNELS_OM. It has three measures: AMOUNT, QUANTITY and COST. The orphan management policy setting are shown as following screenshot: The orphan management policy options that you can set for loading are: No Maintenance: Warehouse Builder does not actively detect, reject, or fix orphan rows. Default Dimension Record: Warehouse Builder assigns a default dimension record for any row that has an invalid or null dimension key value. Use the Settings button to define the default parent row. Reject Orphan: Warehouse Builder does not insert the row if it does not have an existing dimension record. (More details are at http://download.oracle.com/docs/cd/E11882_01/owb.112/e10935/dim_objects.htm#BABEACDG) Mapping LOAD_CHANNELS_OM This mapping loads source data from table SRC_CHANNELS to dimension CHANNELS_OM. The operator CHANNELS_IN is bound to table SRC_CHANNELS; CHANNELS_OUT is bound to dimension CHANNELS_OM. The TOTALS operator is used for generating a constant value for the top level in the dimension. The CLASS_FILTER operator is used to filter out the “invalid” class name, so then we can see what will happen when those channel records with an “invalid” parent are loading into dimension. Some properties of the dimension operator in this mapping are important to orphan management. See the screenshot below: Create Default Level Records: If YES, then default level records will be created. This property must be set to YES for dimensions and cubes if one of their orphan management policies is “Default Parent” or “Default Dimension Record”. This property is set to NO by default, so the user may need to set this to YES manually. LOAD policy for INVALID keys/ LOAD policy for NULL keys: These two properties have the same meaning as in the dimension editor. The values are set to the same as the dimension value when user drops the dimension into the mapping. The user does not need to modify these properties. Record Error Rows: If YES, error rows will be inserted into error table when loading the dimension. REMOVE Orphan Policy: This property is used when removing data from a dimension. Since the dimension loading type is set to LOAD in this example, this property is disabled. Mapping LOAD_SALES_OM This mapping loads source data from table SRC_ORDERS and SRC_ORDER_ITEMS to cube SALES_OM. This mapping seems a little bit complicated, but operators in the red rectangle are used to filter out and generate the records with “invalid” or “null” dimension keys. Some properties of the cube operator in a mapping are important to orphan management. See the screenshot below: Enable Source Aggregation: Should be checked in this example. If the default dimension record orphan policy is set for the cube operator, then it is recommended that source aggregation also be enabled. Otherwise, the orphan management processing may produce multiple fact rows with the same default dimension references, which will cause an “unstable rowset” execution error in the database, since the dimension refs are used as update match attributes for updating the fact table. LOAD policy for INVALID keys/ LOAD policy for NULL keys: These two properties have the same meaning as in the cube editor. The values are set to the same as in the cube editor when the user drops the cube into the mapping. The user does not need to modify these properties. Record Error Rows: If YES, error rows will be inserted into error table when loading the cube. 2. Deploy objects and mappings We now can deploy the objects. First, make sure location SALES_WH_LOCAL has been correctly configured. Then open Control Center Manager by using the menu Tools->Control Center Manager. Expand BI_DEMO->SALES_WH_LOCAL, click SALES_WH node on the project tree. We can see the following objects: Deploy all the objects in the following order: Sequence CLASS_OM_DIM_SEQ Table CHANNELS_OM, SALES_OM, SRC_CHANNELS, SRC_ORDERS, SRC_ORDER_ITEMS Dimension CHANNELS_OM Cube SALES_OM Mapping LOAD_CHANNELS_OM, LOAD_SALES_OM Note that we deployed source tables as well. Normally, we import source table from database instead of deploying them to target schema. However, in this example, we designed the source tables in OWB and deployed them to database for the purpose of this demonstration. 3. Prepare and examine source data Before running the mappings, we need to populate and examine the source data first. Run SRC_CHANNELS.sql, SRC_ORDERS.sql and SRC_ORDER_ITEMS.sql as target user. Then we check the data in these three tables. Table SRC_CHANNELS SQL> select rownum, id, class, name from src_channels; Records 1~5 are correct; they should be loaded into dimension without error. Records 6,7 and 8 have null parents; they should be loaded into dimension with a default parent value, and should be inserted into error table at the same time. Records 9, 10 and 11 have “invalid” parents; they should be rejected by dimension, and inserted into error table. Table SRC_ORDERS and SRC_ORDER_ITEMS SQL> select rownum, a.id, a.channel, b.amount, b.quantity, b.cost from src_orders a, src_order_items b where a.id = b.order_id; Record 178 has null dimension reference; it should be loaded into cube with a default dimension reference, and should be inserted into error table at the same time. Record 179 has “invalid” dimension reference; it should be rejected by cube, and inserted into error table. Other records should be aggregated and loaded into cube correctly. 4. Run the mappings and examine the target data In the Control Center Manager, expand BI_DEMO-> SALES_WH_LOCAL-> SALES_WH-> Mappings, right click on LOAD_CHANNELS_OM node, click Start. Use the same way to run mapping LOAD_SALES_OM. When they successfully finished, we can check the data in target tables. Table CHANNELS_OM SQL> select rownum, total_id, total_name, total_source_id, class_id,class_name, class_source_id, channel_id, channel_name,channel_source_id from channels_om order by abs(dimension_key); Records 1,2 and 3 are the default dimension records for the three levels. Records 8, 10 and 15 are the loaded records that originally have null parents. We see their parents name (class_name) is set to DEF_CLASS_NAME. Those records whose CHANNEL_NAME are Special_4, Special_5 and Special_6 are not loaded to this table because of the invalid parent. Error Table CHANNELS_OM_ERR SQL> select rownum, class_source_id, channel_id, channel_name,channel_source_id, err$$$_error_reason from channels_om_err order by channel_name; We can see all the record with null parent or invalid parent are inserted into this error table. Error reason is “Default parent used for record” for the first three records, and “No parent found for record” for the last three. Table SALES_OM SQL> select a.*, b.channel_name from sales_om a, channels_om b where a.channels=b.channel_id; We can see the order record with null channel_name has been loaded into target table with a default channel_name. The one with “invalid” channel_name are not loaded. Error Table SALES_OM_ERR SQL> select a.amount, a.cost, a.quantity, a.channels, b.channel_name, a.err$$$_error_reason from sales_om_err a, channels_om b where a.channels=b.channel_id(+); We can see the order records with null or invalid channel_name are inserted into error table. If the dimension reference column is null, the error reason is “Default dimension record used for fact”. If it is invalid, the error reason is “Dimension record not found for fact”. Summary In summary, this article illustrated the Orphan Management feature in OWB 11gR2. Automated orphan management policies improve ETL developer and administrator productivity by addressing an important cause of cube and dimension load failures, without requiring developers to explicitly build logic to handle these orphan rows.

    Read the article

  • Into Orbit (OBIEE 11g Launch)

    - by Darryn.Hinett
    After much anticipation, it appears that OBIEE 11g is about to hit the streets. Join Charles Phillips, President, and Thomas Kurian, Executive Vice President, Product Development, for the launch of the latest release of Oracle's business intelligence software. Be the first to hear about Oracle Business Intelligence Enterprise Edition 11g, the new, industry-leading technology platform for business intelligence, which offers: A powerful end-user experience with rich visualisation, search, and actionable collaboration Advancements in analytics, OLAP, and enterprise reporting, with unmatched performance and scalability Simplified system configuration, life-cycle management, and performance optimisation As well as the keynote and technical general session, break out sessions will cover the following topics: Business Intelligence: From Insight to Action In this session, you will learn about an exciting, industry-first innovation that connects business intelligence directly to your business processes. You can spot an opportunity or issue, and immediately initiate appropriate action directly from your dashboard. Oracle Business Intelligence Enterprise Edition 11g Systems Management and Deployment Learn how you can streamline the process of configuring your system, provisioning users, and monitoring and optimising query performance. Attend this session to hear how new integration with Oracle Enterprise Manager provides unique systems management, superior scalability, and high availability and security benefits, while making upgrades effortless. Extending Business Intelligence Analytics with Online Analytical Processing (OLAP) Learn how you can enhance the analytical power and business value of your BI solution with a unified environment for navigating and querying both OLAP and relational data sources. This session will focus on how Oracle Business Intelligence Enterprise Edition 11g, used with Oracle Essbase, can deliver insight at the speed of thought. Integrated Performance Management If your organisation is using or considering performance management applications such as Oracle's Hyperion Planning and Hyperion Financial Management, you will not want to miss this session. See how you can leverage Oracle's BI solution for accessing performance management applications and performing extended financial reporting and analysis. Visualisation and End-user Experience The latest release of Oracle Business Intelligence provides an unrivaled end user experience, including rich interactive dashboards, a vast range of animated charting options, integrated search, and more. This session will also include a close look at how you can leverage location data to visualise geo-spatial information.

    Read the article

  • Organizations &amp; Architecture UNISA Studies &ndash; Chap 7

    - by MarkPearl
    Learning Outcomes Name different device categories Discuss the functions and structure of I/.O modules Describe the principles of Programmed I/O Describe the principles of Interrupt-driven I/O Describe the principles of DMA Discuss the evolution characteristic of I/O channels Describe different types of I/O interface Explain the principles of point-to-point and multipoint configurations Discuss the way in which a FireWire serial bus functions Discuss the principles of InfiniBand architecture External Devices An external device attaches to the computer by a link to an I/O module. The link is used to exchange control, status, and data between the I/O module and the external device. External devices can be classified into 3 categories… Human readable – e.g. video display Machine readable – e.g. magnetic disk Communications – e.g. wifi card I/O Modules An I/O module has two major functions… Interface to the processor and memory via the system bus or central switch Interface to one or more peripheral devices by tailored data links Module Functions The major functions or requirements for an I/O module fall into the following categories… Control and timing Processor communication Device communication Data buffering Error detection I/O function includes a control and timing requirement, to coordinate the flow of traffic between internal resources and external devices. Processor communication involves the following… Command decoding Data Status reporting Address recognition The I/O device must be able to perform device communication. This communication involves commands, status information, and data. An essential task of an I/O module is data buffering due to the relative slow speeds of most external devices. An I/O module is often responsible for error detection and for subsequently reporting errors to the processor. I/O Module Structure An I/O module functions to allow the processor to view a wide range of devices in a simple minded way. The I/O module may hide the details of timing, formats, and the electro mechanics of an external device so that the processor can function in terms of simple reads and write commands. An I/O channel/processor is an I/O module that takes on most of the detailed processing burden, presenting a high-level interface to the processor. There are 3 techniques are possible for I/O operations Programmed I/O Interrupt[t I/O DMA Access Programmed I/O When a processor is executing a program and encounters an instruction relating to I/O it executes that instruction by issuing a command to the appropriate I/O module. With programmed I/O, the I/O module will perform the requested action and then set the appropriate bits in the I/O status register. The I/O module takes no further actions to alert the processor. I/O Commands To execute an I/O related instruction, the processor issues an address, specifying the particular I/O module and external device, and an I/O command. There are four types of I/O commands that an I/O module may receive when it is addressed by a processor… Control – used to activate a peripheral and tell it what to do Test – Used to test various status conditions associated with an I/O module and its peripherals Read – Causes the I/O module to obtain an item of data from the peripheral and place it in an internal buffer Write – Causes the I/O module to take an item of data form the data bus and subsequently transmit that data item to the peripheral The main disadvantage of this technique is it is a time consuming process that keeps the processor busy needlessly I/O Instructions With programmed I/O there is a close correspondence between the I/O related instructions that the processor fetches from memory and the I/O commands that the processor issues to an I/O module to execute the instructions. Typically there will be many I/O devices connected through I/O modules to the system – each device is given a unique identifier or address – when the processor issues an I/O command, the command contains the address of the address of the desired device, thus each I/O module must interpret the address lines to determine if the command is for itself. When the processor, main memory and I/O share a common bus, two modes of addressing are possible… Memory mapped I/O Isolated I/O (for a detailed explanation read page 245 of book) The advantage of memory mapped I/O over isolated I/O is that it has a large repertoire of instructions that can be used, allowing more efficient programming. The disadvantage of memory mapped I/O over isolated I/O is that valuable memory address space is sued up. Interrupts driven I/O Interrupt driven I/O works as follows… The processor issues an I/O command to a module and then goes on to do some other useful work The I/O module will then interrupts the processor to request service when is is ready to exchange data with the processor The processor then executes the data transfer and then resumes its former processing Interrupt Processing The occurrence of an interrupt triggers a number of events, both in the processor hardware and in software. When an I/O device completes an I/O operations the following sequence of hardware events occurs… The device issues an interrupt signal to the processor The processor finishes execution of the current instruction before responding to the interrupt The processor tests for an interrupt – determines that there is one – and sends an acknowledgement signal to the device that issues the interrupt. The acknowledgement allows the device to remove its interrupt signal The processor now needs to prepare to transfer control to the interrupt routine. To begin, it needs to save information needed to resume the current program at the point of interrupt. The minimum information required is the status of the processor and the location of the next instruction to be executed. The processor now loads the program counter with the entry location of the interrupt-handling program that will respond to this interrupt. It also saves the values of the process registers because the Interrupt operation may modify these The interrupt handler processes the interrupt – this includes examination of status information relating to the I/O operation or other event that caused an interrupt When interrupt processing is complete, the saved register values are retrieved from the stack and restored to the registers Finally, the PSW and program counter values from the stack are restored. Design Issues Two design issues arise in implementing interrupt I/O Because there will be multiple I/O modules, how does the processor determine which device issued the interrupt? If multiple interrupts have occurred, how does the processor decide which one to process? Addressing device recognition, 4 general categories of techniques are in common use… Multiple interrupt lines Software poll Daisy chain Bus arbitration For a detailed explanation of these approaches read page 250 of the textbook. Interrupt driven I/O while more efficient than simple programmed I/O still requires the active intervention of the processor to transfer data between memory and an I/O module, and any data transfer must traverse a path through the processor. Thus is suffers from two inherent drawbacks… The I/O transfer rate is limited by the speed with which the processor can test and service a device The processor is tied up in managing an I/O transfer; a number of instructions must be executed for each I/O transfer Direct Memory Access When large volumes of data are to be moved, an efficient technique is direct memory access (DMA) DMA Function DMA involves an additional module on the system bus. The DMA module is capable of mimicking the processor and taking over control of the system from the processor. It needs to do this to transfer data to and from memory over the system bus. DMA must the bus only when the processor does not need it, or it must force the processor to suspend operation temporarily (most common – referred to as cycle stealing). When the processor wishes to read or write a block of data, it issues a command to the DMA module by sending to the DMA module the following information… Whether a read or write is requested using the read or write control line between the processor and the DMA module The address of the I/O device involved, communicated on the data lines The starting location in memory to read from or write to, communicated on the data lines and stored by the DMA module in its address register The number of words to be read or written, communicated via the data lines and stored in the data count register The processor then continues with other work, it delegates the I/O operation to the DMA module which transfers the entire block of data, one word at a time, directly to or from memory without going through the processor. When the transfer is complete, the DMA module sends an interrupt signal to the processor, this the processor is involved only at the beginning and end of the transfer. I/O Channels and Processors Characteristics of I/O Channels As one proceeds along the evolutionary path, more and more of the I/O function is performed without CPU involvement. The I/O channel represents an extension of the DMA concept. An I/O channel ahs the ability to execute I/O instructions, which gives it complete control over I/O operations. In a computer system with such devices, the CPU does not execute I/O instructions – such instructions are stored in main memory to be executed by a special purpose processor in the I/O channel itself. Two types of I/O channels are common A selector channel controls multiple high-speed devices. A multiplexor channel can handle I/O with multiple characters as fast as possible to multiple devices. The external interface: FireWire and InfiniBand Types of Interfaces One major characteristic of the interface is whether it is serial or parallel parallel interface – there are multiple lines connecting the I/O module and the peripheral, and multiple bits are transferred simultaneously serial interface – there is only one line used to transmit data, and bits must be transmitted one at a time With new generation serial interfaces, parallel interfaces are becoming less common. In either case, the I/O module must engage in a dialogue with the peripheral. In general terms the dialog may look as follows… The I/O module sends a control signal requesting permission to send data The peripheral acknowledges the request The I/O module transfers data The peripheral acknowledges receipt of data For a detailed explanation of FireWire and InfiniBand technology read page 264 – 270 of the textbook

    Read the article

  • Stay Connected with Oracle Primavera

    - by Oracle OpenWorld Blog Team
    By Beata P. RosaAdd These Four Essential Sessions to Your PortfolioIf you use Oracle’s Primavera and you're attending Oracle OpenWorld, then the Oracle Primavera sessions are for you. Oracle Primavera-specific content includes 16 sessions, as well as hands-on labs, demos, meet the experts opportunities, and exhibits. The sessions are designed for you to gain valuable information on how to respond to a changing business environment, stay on the leading edge, and effectively manage your entire project portfolio from prioritization to delivery. Here are four must-attend sessions:Get Proactive: Best Practices for Supporting Oracle Enterprise Performance Management Products Learn how to take full advantage of Oracle’s enterprise performance management (EPM) products with all the great tools, resources, and product updates you're entitled to through Oracle Support. (CON3048: Monday, October 1, 10:45 a.m., InterContinental, InterContinental Ballroom B) Primavera Enterprise Project Portfolio Management Vision Come to this session to hear from the leaders of Oracle’s Primavera Global Business Unit, who present the vision for the Primavera platform and provide an overview of its direction and planned capabilities. (CON8252: Monday, October 1, 3:15 p.m., Westin San Francisco, Metropolitan III)General Session: Decisions for Project Executives This project portfolio management (PPM) general session discusses the vital role of analytics in the project management arena and offers a view of the project executive role in the future. (GEN9606: Tuesday, October 2, 1:15 p.m., Moscone West Room 3002/3004) Oracle Primavera Hands-on Labs In practical self-paced learning sessions covering everything from Oracle’s Primavera P6 solutions to Primavera Portfolio Management, Primavera Risk Analysis, and Primavera Capital Project and Program Management Solutions, you’ll discover new ways to derive maximum benefits from your Oracle software.(Seven labs to choose from - see Focus on Oracle Primavera for more information)Download the Focus On Oracle Primavera guide, and stay connected via Twitter.com/@OracleEPPM, LinkedIn, and Facebook/OraclePrimavera.

    Read the article

  • Out-of-the-Box Integration Links Primavera Solutions with PeopleSoft Projects Applications

    - by Sylvie MacKenzie, PMP
    In a move that brings best-in-class enterprise project portfolio management to Oracle’s PeopleSoft enterprise resource planning customers, Oracle announced the integration of Oracle’s PeopleSoft projects applications and Oracle’s Primavera P6 Enterprise Project Portfolio Management. The combination of PeopleSoft financial controls and Primavera portfolio management capabilities brings greater oversight of end-to-end processes to help organizations improve the planning and execution efforts needed to deliver projects on time and within budget. “As an organization with many high-value, project-driven initiatives, we are very pleased to see Oracle’s investment in this important integration,” says Janardhanan Sankar, senior vice president for technology and quality at ITC Infotech India Ltd. Oracle’s PeopleSoft projects applications enable project-centric organizations and departments to establish core operational processes for full project lifecycle management across operations and finance. The integration with Primavera P6 Enterprise Project Portfolio Management means organizations can eliminate costly and difficult-to-maintain proprietary integrations. Organizations can also standardize on the Oracle technologies to Align back-office budgets and costs with project operations to help ensure accurate forecasting of costs, resources, and schedules Provide an accurate single source of truth to financial managers and analysts using Oracle’s PeopleSoft projects applications, and to project managers using Primavera P6 Enterprise Project Portfolio Management  Enhance project collaboration and execution by having all users utilizing common solutions to communicate, plan, and deliver projects “By bringing together Oracle’s PeopleSoft projects applications and Oracle’s Primavera P6 Enterprise Project Portfolio Management, we are able to provide customers with the infrastructure they need to achieve a single source of truth on the projects they are managing,” says Paco Aubrejuan, Oracle’s group vice president and general manager, PeopleSoft. “This real-time visibility drives profitability, increases productivity, and improves operations.” For more information, view the on-demand Webcast, “Bridging Business Processes for Optimal Portfolio Performance,” or read about the new integration.

    Read the article

  • 2013 U.S. GAAP Financial Reporting Taxonomy Available for Public Review and Comment

    - by Theresa Hickman
    FASB recently released the proposed 2013 U.S. GAAP Reporting Taxonomy. Comments are due October 29, 2012 to be finalized and published early 2013.  The proposed 2013 U.S. GAAP taxonomy and instructions on how to submit comments are available at the FASB’s XBRL page. In previous blog entries, I talked about how Oracle Hyperion Disclosure Management supports the latest taxonomy, enabling financial managers to easily comply with the latest filing requirements. The taxonomy is a list of computer-readable tags in XBRL that allows companies to annotate the voluminous financial data that is included in typical long-form financial statements and related footnote disclosures. The tags allow computers to automatically search for, assemble, and process data so it can be readily accessed and analyzed by investors, analysts, journalists, and regulators. You do not have to have Oracle Hyperion Financial Management, used for consolidating financial results, to generate XBRL. You just need Oracle Hyperion Disclosure Management to generate XBRL instance documents from financial applications, such as Oracle E-Business Suite, Oracle PeopleSoft, Oracle JD Edwards EnterpriseOne, and Oracle Fusion General Ledger. To generate XBRL tags and complete SEC filings using your existing financial applications with Oracle Hyperion Disclosure Management, here are the steps: Download the XBRL taxonomy from the SEC or XBRL Website into Hyperion Disclosure Management to create a company taxonomy. Publish financial statements from the general ledger to Microsoft Excel or Microsoft Word. Create the SEC filing in the Microsoft programs and perform the XBRL tag mapping in Oracle Hyperion Disclosure Management. Ensure that the SEC filing meets XBRL and SEC EDGAR Filer Manual validation requirements. Validate and submit the company taxonomy and XBRL instance document to the SEC. Get more details about Oracle Hyperion Disclosure Management.

    Read the article

  • SharePoint 2010 Video Training

    - by Sahil Malik
    Ad:: SharePoint 2007 Training in .NET 3.5 technologies (more information). Yes, the DVD is finally available. This is an exhaustive 14 hour video course that Carl and I recorded back in April. It is an end-to-end overview of SharePoint 2010. You can view more details including ordering information about the DVD here. And if you’re interested, a SharePoint 2007 video training version is also available. Carl and I worked quite hard on putting these together, so we hope you enjoy these. Detailed Table of Contents: Introduction (13:49) 30,000 Foot Overview (42:07) Application Management (43:35) User Experience (16:00) Writing Code Part 1 (1:07:49) Writing Code Part 2 (34:41) Simple Web Parts (14:01) Visual Web Parts (6:35) Pages (35:02) Putting it All Together (29:13) Client Side Technology (49:19) ADO.NET Data Services (51:29) Custom Data Services (43:30) Managing Data (29:02) Managing Data: Content Types (17:11) Managing Data: Events (19:22) Managing Data: List Scalability (35:51) Managing Data: Querying (20:07) Enterprise Content Management: DocumentIDs and Document Sets (16:44) Enterprise Content Management: Metadata Infrastructure (22:13) Enterprise Content Management: Record Management (26:27) Enterprise Content Management: Content Organizer (7:21) Enterprise Content Management: Enterprise Content Types (11:21) Business Connectivity Services (BCS) in the SharePoint Designer (26:09) BCS in Visual Studio (9:57) Workflows in the SharePoint Designer (22:07) Workflows in Visual Studio (19:01) Business Intelligence (21:14) Excel (15:25) Performance Point (24:37) Security: Claims-Based Authentication (27:13) Security: Secure Store Service (11:04) Security: The SharePoint Object Model (11:16) Comment on the article ....

    Read the article

  • Enterprise Manager Grid Control licencelése

    - by Lajos Sárecz
    Gyakran kapok kérdéseket az Oracle Enterprise Manager Grid Control licencelésével kapcsolatban, ezért az alábbiakban igyekszem összefoglalni a legfontosabb információkat. Az alábbi ismerteto nem teljes köru, mivel számos olyan termék van (Data Masking, Real Application Testing, Real User Experience Insight, Application Testing Suite), melyek kapcsolódnak az Enterprise Manager-hez, azonban licencelésük másképp muködik. Az Enterprise Manager licenceléssel kapcsolatban az elsodleges információ forrás a Licensing Information doksi. A legfontosabb információk: - A Grid Control keretrendszer (Agent-ek és a konzol az alapfunkciókkal - lásd késobb) önmagában ingyenes, sot restricted-use licencet tartalmaz Oracle Database-re, amennyiben azt csak az Oracle Management Repository céljára használják. Fontos, hogy ez nem tartalmaz egyéb Oracle Database opciókat, mint például a RAC! Hasonlóképpen az Oracle WebLogic Server is kizárólagosan az Oracle Management Server kiszolgálására használható ingyenesen, de fürtözés nélkül. - A Grid Control alapfunkcionalitása: Discovery, Groups, Job Scheduling, Real time availability, Performance & monitoring, Target Home Pages, Administration, Console alerts - Az alapfunkcionalitás felügyelt termékektol függoen bovítheto Management Pack, Plug-in és Connector termékekkel. Alapvetoen ezek licencelése mindig a monitorozott, felügyelt termék licenceléséhez kell, hogy igazodjon. Tehát például ha 2 adatbázis szerverre szeretnénk Diagnostic Pack-ek használni, akkor mindkettore kell CPU vagy NUP (Named User Plus) licencet vásárolni, attól függoen az adatbázis maga milyen licenccel rendelkezik. Megjegyzem ezt a konkrét Management Pack-ek kizárólag Enterprise Edition Database esetén lehet alkalmazni. - Számos fizetos funkció külön telepítés nélkül is elérheto a Grid Control felületén (ugyanez igaz Database Control-ra és Fusion Middleware Control-ra is). Hogy elkerüljük a licenc sértést, érdemes ellenorízni hogy az adott környezetben mely Management Pack-ek használata került bekapcsolásra. Ezt a Grid Control Setup menüjében a Management Pack Access almenüben tehetjük meg legegyszerubben. Részleteseb leírás itt található. Database Diagnostic és Tuning Pack adatbázis szintu kikapcsolására is lehetoség van, hogy parancssorból se lehessen használni oket, errol korábban már írtam. Az egyes management termékek USD ára megtalálható az árlistában. Ha valami fontos kimaradt, várom a kérdéseket, hozzászólásokat, és igény szerint bovítem a fentieket.

    Read the article

  • Oracle E-Business Suite is Helping to Save Lives at the National Marrow Donor Program

    - by Di Seghposs
    To improve the management of its life-saving operations, the National Marrow Donor Program recently modernized its financial and procurement operations by upgrading to Oracle E-Business Suite 12.1.   As the global leader in bone marrow and umbilical cord blood transplants, the NMDP manages a complex ecosystem of donor, patient, hospital, and biological data. “Maintaining accurate data and having an efficient matching process is essential, particularly as our global database of bone marrow patients grows and donor lists expand,” says Bruce Schmaltz, director of finance/controller. “We rely on the Oracle E-Business Suite to ensure our procurement and financial management processes meet the highest standards, enabling our growing non-profit to work swiftly and efficiently to help improve and save lives.” As the non-profit organization and its registry grew larger, NMDP needed a modern platform to store and integrate its financial information and complicated procurement process. It selected Oracle E-Business Suite for its ability to fit seamlessly into NMDP’s enterprise architecture. NMDP initially implemented Oracle E-Business Suite release 12 by leveraging Oracle Business Accelerators, which are rapid implementation tools and templates that help reduce implementation time and costs. With Oracle Financial Management and Oracle Procurement, NMDP has streamlined back-office processes and integrated its procure-to-pay business processes by leveraging industry leading accounts payable, accounts receivable, and general ledger modules. NMDP is currently rolling out Oracle Hyperion Performance Management applications and plans to implement Oracle Order Management and Oracle Advanced Pricing by the end of 2012. Read more details about NMDP’s modernization efforts.  For more updates on Oracle Financial Management Solutions, view our November 2012 Oracle Information InDepth Financial Management newsletter. Subscribe Now. 

    Read the article

  • Oracle Open World / Public Sector / Identity Platform

    - by user12604761
    For those attending Oracle Open World (Oct. 1st - 3rd, 2012 at the Moscone Center in San Francisco), the following details are recommended:  OOW Focus on Public Sector. Also, Oracle's foundational Identity and Access Management and Database Security products that support government security ICAM solutions are covered extensively during the event, the following will be available: The focus is on Oracle's Modern Identity Management Platform.   Integrated Identity Governance Mobile Access Management Complete Access Management Low Risk Upgrades The options for attendees include 18 sessions for Identity and Access Management, 9 Identity and Access Management demonstration topics at the Identity Management Demo Grounds, and 2 hands on labs, as well as 21 database security sessions. Oracle Public Sector Reception at OOW:  Join Oracle's Public Sector team on Monday, October 1 for a night of food and sports in a casual setting at Jillian’s, adjacent to Moscone Center on Fourth Street. In addition to meeting the Public Sector team, you can enjoy Monday Night Football on several big screen TVs in a fun sports atmosphere. When: Monday, October 1, 6:30 p.m.–9:30 p.m. Where: Jillian's, 101 Fourth Street, San Francisco 

    Read the article

  • More Denali Execution Plan Warning Goodies

    - by Dave Ballantyne
    In my last blog, I showed how the execution plan in denali has been enhanced by 2 new warnings ,conversion affecting cardinality and conversion affecting seek, which are shown when a data type conversion has happened either implicitly or explicitly. That is not all though, there is more .  Also added are two warnings when performance has been affected due to memory issues. Memory spills to tempdb are a costly operation and happen when SqlServer is under memory pressure and needs to free some up. For a long time you have been able to see these as warnings in a profiler trace as a sort or hash warning event,  but now they are included right in the execution plan.  Not only that but also you can see which operator caused the spill , not just which statement.  Pretty damn handy. Another cause of performance problems relating to memory are memory grant waits.  Here is an informative write up on them,  but simply speaking , SQLServer has to allocate a certain amount of memory for each statement. If it is unable to you get a “memory grant wait”.  Once again there are other methods of analyzing these,  but the plan now shows these too. Don't worry that’s not real production code There is one other new warning that is of interest to me, “Unmatched Indexes”.  Once I find out the conditions under which that fires ill blog about it.

    Read the article

  • Enablement 2.0 Get Specialized

    - by mseika
    Oracle PartnerNetwork Specialized program is releasing new certifications on our latest products, and partners are invited to be the first candidates.Oracle Taleo Enterprise Cloud Service 2013 Specialization – Now Active!This specialization recognizes partner organizations that are proficient in positioning, selling and implementing Taleo’s Enterprise Talent Management solutions.Taleo's Talent Management Cloud helps organizations attract, develop, motivate and retain human capital to improve performance and drive growth. Oracle’s Taleo Enterprise Cloud Service 2013 Specialization encompasses the following products: Oracle Taleo Performance Management Cloud Service, Oracle Taleo Recruiting Cloud Service and Oracle Taleo Performance Management Cloud Service.Topics covered in this Specialization include: Selling and positioning Taleo’s Talent Management Cloud; Functional and Technical positioning. Implementation tracks are included for Taleo Performance Management Cloud Service, Oracle Taleo Recruiting Cloud Service and Oracle Taleo Performance Management Cloud Service. Oracle partners who achieve this Specialization are differentiated in the marketplace through proven expertise in Oracle Taleo Enterprise Cloud Service.New Certified Implementation Specialist Exam in Production! Oracle Taleo Recruiting Cloud Service 2013 Certified Implementation Specialist (1Z0-474) All Beta exam participants will receive their exam scores as of beginning of July 2013. The successful candidates will receive their certificates starting mid-July 2013. Take the exam now at a near-by Pearson VUE testing center!Contact Us Please direct any inquiries you may have to Oracle Partner Enablement team at [email protected].

    Read the article

  • Enablement 2.0 Get Specialized

    - by mseika
    Oracle PartnerNetwork Specialized program is releasing new certifications on our latest products, and partners are invited to be the first candidates.Oracle Taleo Enterprise Cloud Service 2013 Specialization – Now Active!This specialization recognizes partner organizations that are proficient in positioning, selling and implementing Taleo’s Enterprise Talent Management solutions.Taleo's Talent Management Cloud helps organizations attract, develop, motivate and retain human capital to improve performance and drive growth. Oracle’s Taleo Enterprise Cloud Service 2013 Specialization encompasses the following products: Oracle Taleo Performance Management Cloud Service, Oracle Taleo Recruiting Cloud Service and Oracle Taleo Performance Management Cloud Service.Topics covered in this Specialization include: Selling and positioning Taleo’s Talent Management Cloud; Functional and Technical positioning. Implementation tracks are included for Taleo Performance Management Cloud Service, Oracle Taleo Recruiting Cloud Service and Oracle Taleo Performance Management Cloud Service.Oracle partners who achieve this Specialization are differentiated in the marketplace through proven expertise in Oracle Taleo Enterprise Cloud Service.  New Certified Implementation Specialist Exam in Production! Oracle Taleo Recruiting Cloud Service 2013 Certified Implementation Specialist (1Z0-474) All Beta exam participants will receive their exam scores as of beginning of July 2013. The successful candidates will receive their certificates starting mid-July 2013.   Take the exam now at a near-by Pearson VUE testing center!Contact Us Please direct any inquiries you may have to Oracle Partner Enablement team at [email protected].

    Read the article

  • Enablement 2.0 Get Specialized

    - by mseika
    Oracle PartnerNetwork Specialized program is releasing new certifications on our latest products, and partners are invited to be the first candidates.Oracle Taleo Enterprise Cloud Service 2013 Specialization – Now Active!This specialization recognizes partner organizations that are proficient in positioning, selling and implementing Taleo’s Enterprise Talent Management solutions.Taleo's Talent Management Cloud helps organizations attract, develop, motivate and retain human capital to improve performance and drive growth. Oracle’s Taleo Enterprise Cloud Service 2013 Specialization encompasses the following products: Oracle Taleo Performance Management Cloud Service, Oracle Taleo Recruiting Cloud Service and Oracle Taleo Performance Management Cloud Service.Topics covered in this Specialization include: Selling and positioning Taleo’s Talent Management Cloud; Functional and Technical positioning. Implementation tracks are included for Taleo Performance Management Cloud Service, Oracle Taleo Recruiting Cloud Service and Oracle Taleo Performance Management Cloud Service. Oracle partners who achieve this Specialization are differentiated in the marketplace through proven expertise in Oracle Taleo Enterprise Cloud Service.New Certified Implementation Specialist Exam in Production! Oracle Taleo Recruiting Cloud Service 2013 Certified Implementation Specialist (1Z0-474) All Beta exam participants will receive their exam scores as of beginning of July 2013. The successful candidates will receive their certificates starting mid-July 2013. Take the exam now at a near-by Pearson VUE testing center!Contact Us Please direct any inquiries you may have to Oracle Partner Enablement team at [email protected].

    Read the article

  • Enablement 2.0 Get Specialized

    - by mseika
    Oracle PartnerNetwork Specialized program is releasing new certifications on our latest products, and partners are invited to be the first candidates.Oracle Taleo Enterprise Cloud Service 2013 Specialization – Now Active!This specialization recognizes partner organizations that are proficient in positioning, selling and implementing Taleo’s Enterprise Talent Management solutions.Taleo's Talent Management Cloud helps organizations attract, develop, motivate and retain human capital to improve performance and drive growth. Oracle’s Taleo Enterprise Cloud Service 2013 Specialization encompasses the following products: Oracle Taleo Performance Management Cloud Service, Oracle Taleo Recruiting Cloud Service and Oracle Taleo Performance Management Cloud Service. Topics covered in this Specialization include: Selling and positioning Taleo’s Talent Management Cloud; Functional and Technical positioning. Implementation tracks are included for Taleo Performance Management Cloud Service, Oracle Taleo Recruiting Cloud Service and Oracle Taleo Performance Management Cloud Service.Oracle partners who achieve this Specialization are differentiated in the marketplace through proven expertise in Oracle Taleo Enterprise Cloud Service.New Certified Implementation Specialist Exam in Production! Oracle Taleo Recruiting Cloud Service 2013 Certified Implementation Specialist (1Z0-474) All Beta exam participants will receive their exam scores as of beginning of July 2013. The successful candidates will receive their certificates starting mid-July 2013. Take the exam now at a near-by Pearson VUE testing center!Contact Us Please direct any inquiries you may have to Oracle Partner Enablement team at [email protected].

    Read the article

  • Series On Embedded Development (Part 1)

    - by user12612705
    This is the first in a series of entries on developing applications for the embedded environment. Most of this information is relevant to any type of embedded development (and even for desktop and server too), not just Java. This information is based on a talk Hinkmond Wong and I gave at JavaOne 2012 entitled Reducing Dynamic Memory in Java Embedded Applications. One thing to remember when developing embeddded applications is that memory matters. Yes, memory matters in desktop and server environments as well, but there's just plain less of it in embedded devices. So I'm going to be talking about saving this precious resource as well as another precious resource, CPU cycles...and a bit about power too. CPU matters too, and again, in embedded devices, there's just plain less of it. What you'll find, no surprise, is that there's a trade-off between performance and memory. To get better performance, you need to use more memory, and to save more memory, you need to need to use more CPU cycles. I'll be discussing three Memory Reduction Categories: - Optionality, both build-time and runtime. Optionality is about providing options so you can get rid of the stuff you don't need and include the stuff you do need. - Tunability, which is about providing options so you can tune your application by trading performance for size, and vice-versa. - Efficiency, which is about balancing size savings with performance.

    Read the article

  • Exchange 2003 -- Mailbox Management not deleting ALL messages aged 30 days or older...

    - by tcv
    I've recently created a Mailbox Management task within Exchange 2003 that, every night, looks at the contents of the Deleted Items within a particular mailbox and deletes mail that's 30 days or older. The scheduled task ran on its own last night and I have confirmed that messages within the right mailbox and the right folder were, in fact, processed. Many mails were deleted ... but not never email older than 30 days. In fact, the choice seems kinda random. Last night 3/10/2010 was the 30 day watermark. Mails were deleted from 3/10/2010, sure enough, but not all of them. Mails older than 3/10/2010 were deleted as well, but, again, not all of them. The only criteria I have on the management -- aside from the single mailbox and single folder scopes -- is the age criteria. The size criteria is set to Any, meaning I don't care about the size. I care about the age. It's made me wonder where there is some sort of limit on how many mails can be processed? The schedule is set for 12am and 1am every night. Any hints appreciated.

    Read the article

  • Is it possible to open a sqlite database from within microsoft sql management studio?

    - by Brian T Hannan
    Is there a way to open a .db file (sqlite database file) from within microsoft sql management studio? Right now we have a process that will grab the data from a microsoft sql server database and put it into a sqlite database file that will be used by an application later on. Is there a way to open the sqlite database file so that it can be compared to the data inside the sql server database ... using only one sql query? Is there a plug-in for microsoft sql management studio? Or maybe there is another way to do this same task using only one query. Right now we have to write two scripts ... one for sql server database and one for sqlite database ... then take the output from each in the same format and put them each in their own OpenOffice spreadsheet file. Finally, we compare the two files to see if there are any differences. Perhaps there's a better way to do this. P.S. Alot of applications use sqlite internally: Well-Known Users Of SQLite

    Read the article

  • ORA-4030 Troubleshooting

    - by [email protected]
    QUICKLINK: Note 399497.1 FAQ ORA-4030 Note 1088087.1 : ORA-4030 Diagnostic Tools [Video]   Have you observed an ORA-0430 error reported in your alert log? ORA-4030 errors are raised when memory or resources are requested from the Operating System and the Operating System is unable to provide the memory or resources.   The arguments included with the ORA-4030 are often important to narrowing down the problem. For more specifics on the ORA-4030 error and scenarios that lead to this problem, see Note 399497.1 FAQ ORA-4030.   Looking for the best way to diagnose? There are several available diagnostic tools (error tracing, 11g Diagnosibility, OCM, Process Memory Guides, RDA, OSW, diagnostic scripts) that collectively can prove powerful for identifying the cause of the ORA-4030.    Error Tracing   The ORA-4030 error usually occurs on the client workstation and for this reason, a trace file and alert log entry may not have been generated on the server side.  It may be necessary to add additional tracing events to get initial diagnostics on the problem. To setup tracing to trap the ORA-4030, on the server use the following in SQLPlus: alter system set events '4030 trace name heapdump level 536870917;name errorstack level 3';Once the error reoccurs with the event set, you can turn off  tracing using the following command in SQLPlus:alter system set events '4030 trace name context off; name context off';NOTE:   See more diagnostics information to collect in Note 399497.1  11g DiagnosibilityStarting with Oracle Database 11g Release 1, the Diagnosability infrastructure was introduced which places traces and core files into a location controlled by the DIAGNOSTIC_DEST initialization parameter when an incident, such as an ORA-4030 occurs.  For earlier versions, the trace file will be written to either USER_DUMP_DEST (if the error was caught in a user process) or BACKGROUND_DUMP_DEST (if the error was caught in a background process like PMON or SMON). The trace file may contain vital information about what led to the error condition.    Note 443529.1 11g Quick Steps to Package and Send Critical Error Diagnostic Informationto Support[Video]  Oracle Configuration Manager (OCM) Oracle Configuration Manager (OCM) works with My Oracle Support to enable proactive support capability that helps you organize, collect and manage your Oracle configurations. Oracle Configuration Manager Quick Start Guide Note 548815.1: My Oracle Support Configuration Management FAQ Note 250434.1: BULLETIN: Learn More About My Oracle Support Configuration Manager    General Process Memory Guides   An ORA-4030 indicates a limit has been reached with respect to the Oracle process private memory allocation.    Each Operating System will handle memory allocations with Oracle slightly differently. Solaris     Note 163763.1Linux       Note 341782.1IBM AIX   Notes 166491.1 and 123754.1HP           Note 166490.1Windows Note 225349.1, Note 373602.1, Note 231159.1, Note 269495.1, Note 762031.1Generic    Note 169706.1   RDAThe RDA report will show more detailed information about the database and Server Configuration. Note 414966.1 RDA Documentation Index Download RDA -- refer to Note 314422.1 Remote Diagnostic Agent (RDA) 4 - Getting Started OS Watcher (OSW)This tool is designed to gather Operating System side statistics to compare with the findings from the database.  This is a key tool in cases where memory usage is higher than expected on the server while not experiencing ORA-4030 errors currently. Reference more details on setup and usage in Note 301137.1 OS Watcher User Guide Diagnostic Scripts   Refer to Note 1088087.1 : ORA-4030 Diagnostic Tools [Video] Common Causes/Solutions The ORA-4030 can occur for a variety of reasons.  Some common causes are:   * OS Memory limit reached such as physical memory and/or swap/virtual paging.   For instance, IBM AIX can experience ORA-4030 issues related to swap scenarios.  See Note 740603.1 10.2.0.4 not using large pages on AIX for more on that problem. Also reference Note 188149.1 for pointers on 10g and stack size issues.* OS limits reached (kernel or user shell limits) that limit overall, user level or process level memory * OS limit on PGA memory size due to SGA attach address           Reference: Note 1028623.6 SOLARIS How to Relocate the SGA* Oracle internal limit on functionality like PL/SQL varrays or bulk collections. ORA-4030 errors will include arguments like "pl/sql vc2" "pmucalm coll" "pmuccst: adt/re".  See Coding Pointers for pointers on application design to get around these issues* Application design causing limits to be reached* Bug - space leaks, heap leaks   ***For reference to the content in this blog, refer to Note.1088267.1 Master Note for Diagnosing ORA-4030

    Read the article

  • SPARC M7 Chip - 32 cores - Mind Blowing performance

    - by Angelo-Oracle
    The M7 Chip Oracle just announced its Next Generation Processor at the HotChips HC26 conference. As the Tech Lead in our Systems Division's Partner group, I had a front row seat to the extraordinary price performance advantage of Oracle current T5 and M6 based systems. Partner after partner tested  these systems and were impressed with it performance. Just read some of the quotes to see what our partner has been saying about our hardware. We just announced our next generation processor, the M7. This has 32 cores (up from 16-cores in T5 and 12-cores in M6). With 20 nm technology  this is our most advanced processor. The processor has more cores than anything else in the industry today. After the Sun acquisition Oracle has released 5 processors in 4 years and this is the 6th.  The S4 core  The M7 is built using the foundation of the S4 core. This is the next generation core technology. Like its predecessor, the S4 has 8 dynamic threads. It increases the frequency while maintaining the Pipeline depth. Each core has its own fine grain power estimator that keeps the core within its power envelop in 250 nano-sec granularity. Each core also includes Software in Silicon features for Application Acceleration Support. Each core includes features to improve Application Data Integrity, with almost no performance loss. The core also allows using part of the Virtual Address to store meta-data.  User-Level Synchronization Instructions are also part of the S4 core. Each core has 16 KB Instruction and 16 KB Data L1 cache. The Core Clusters  The cores on the M7 chip are organized in sets of 4-core clusters. The core clusters share  L2 cache.  All four cores in the complex share 256 KB of 4 way set associative L2 Instruction Cache, with over 1/2 TB/s of throughput. Two cores share 256 KB of 8 way set associative L2 Data Cache, with over 1/2 TB/s of throughput. With this innovative Core Cluster architecture, the M7 doubles core execution bandwidth. to maximize per-thread performance.  The Chip  Each  M7 chip has 8 sets of these core-clusters. The chip has 64 MB on-chip L3 cache. This L3 caches is shared among all the cores and is partitioned into 8 x 8 MB chunks. Each chunk is  8-way set associative cache. The aggregate bandwidth for the L3 cache on the chip is over 1.6TB/s. Each chip has 4 DDR4 memory controllers and can support upto 16 DDR4 DIMMs, allowing for 2 TB of RAM/chip. The chip also includes 4 internal links of PCIe Gen3 I/O controllers.  Each chip has 7 coherence links, allowing for 8 of these chips to be connected together gluelessly. Also 32 of these chips can be connected in an SMP configuration. A potential system with 32 chips will have 1024 cores and 8192 threads and 64 TB of RAM.  Software in Silicon The M7 chip has many built in Application Accelerators in Silicon. These features will be exposed to our Software partners using the SPARC Accelerator Program.  The M7  has built-in logic to decompress data at the speed of memory access. This means that applications can directly work on compressed data in memory increasing the data access rates. The VA Masking feature allows the use of part of the virtual address to store meta-data.  Realtime Application Data Integrity The Realtime Application Data Integrity feature helps applications safeguard against invalid, stale memory reference and buffer overflows. The first 4-bits if the Pointer can be used to store a version number and this version number is also maintained in the memory & cache lines. When a pointer accesses memory the hardware checks to make sure the two versions match. A SEGV signal is raised when there is a mismatch. This feature can be used by the Database, applications and the OS.  M7 Database In-Memory Query Accelerator The M7 chip also includes a In-Silicon Query Engines.  These accelerate tasks that work on In-Memory Columnar Vectors. Oracle In-Memory options stores data in Column Format. The M7 Query Engine can speed up In-Memory Format Conversion, Value and Range Comparisons and Set Membership lookups. This engine can work on Compressed data - this means not only are we accelerating the query performance but also increasing the memory bandwidth for queries.  SPARC Accelerated Program  At the Hotchips conference we also introduced the SPARC Accelerated Program to provide our partners and third part developers access to all the goodness of the M7's SPARC Application Acceleration features. Please get in touch with us if you are interested in knowing more about this program. 

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • Binary search in a sorted (memory-mapped ?) file in Java

    - by sds
    I am struggling to port a Perl program to Java, and learning Java as I go. A central component of the original program is a Perl module that does string prefix lookups in a +500 GB sorted text file using binary search (essentially, "seek" to a byte offset in the middle of the file, backtrack to nearest newline, compare line prefix with the search string, "seek" to half/double that byte offset, repeat until found...) I have experimented with several database solutions but found that nothing beats this in sheer lookup speed with data sets of this size. Do you know of any existing Java library that implements such functionality? Failing that, could you point me to some idiomatic example code that does random access reads in text files? Alternatively, I am not familiar with the new (?) Java I/O libraries but would it be an option to memory-map the 500 GB text file (I'm on a 64-bit machine with memory to spare) and do binary search on the memory-mapped byte array? I would be very interested to hear any experiences you have to share about this and similar problems.

    Read the article

  • How can I render an in-memory UIViewController's view Landscape?

    - by Aaron
    I'm trying to render an in-memory (but not in hierarchy, yet) UIViewController's view into an in-memory image buffer so I can do some interesting transition animations. However, when I render the UIViewController's view into that buffer, it is always rendering as though the controller is in Portrait orientation, no matter the orientation of the rest of the app. How do I clue this controller in? My code in RootViewController looks like this: MyUIViewController* controller = [[MyUIViewController alloc] init]; int width = self.view.frame.size.width; int height = self.view.frame.size.height; int bitmapBytesPerRow = width * 4; unsigned char *offscreenData = calloc(bitmapBytesPerRow * height, sizeof(unsigned char)); CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB(); CGContextRef offscreenContext = CGBitmapContextCreate(offscreenData, width, height, 8, bitmapBytesPerRow, colorSpace, kCGImageAlphaPremultipliedLast); CGContextTranslateCTM(offscreenContext, 0.0f, height); CGContextScaleCTM(offscreenContext, 1.0f, -1.0f); [(CALayer*)[controller.view layer] renderInContext:offscreenContext]; At that point, the offscreen memory buffers contents are portrait-oriented, even when the window is in landscape orientation. Ideas?

    Read the article

  • How can I store large amount of data from a database to XML (memory problem)?

    - by Andrija
    First, I had a problem with getting the data from the Database, it took too much memory and failed. I've set -Xmx1500M and I'm using scrolling ResultSet so that was taken care of. Now I need to make an XML from the data, but I can't put it in one file. At the moment, I'm doing it like this: while(rs.next()){ i++; xmlStringBuilder.append("\n\t<row>"); xmlStringBuilder.append("\n\t\t<ID>" + Util.transformToHTML(rs.getInt("id")) + "</ID>"); xmlStringBuilder.append("\n\t\t<JED_ID>" + Util.transformToHTML(rs.getInt("jed_id")) + "</JED_ID>"); xmlStringBuilder.append("\n\t\t<IME_PJ>" + Util.transformToHTML(rs.getString("ime_pj")) + "</IME_PJ>"); //etc. xmlStringBuilder.append("\n\t</row>"); if (i%100000 == 0){ //stores the data to a file with the name i.xml storeKBR(xmlStringBuilder.toString(),i); xmlStringBuilder= null; xmlStringBuilder= new StringBuilder(); } and it works; I get 12 100 MB files. Now, what I'd like to do is to do is have all that data in one file (which I then compress) but if just remove the if part, I go out of memory. I thought about trying to write to a file, closing it, then opening, but that wouldn't get me much since I'd have to load the file to memory when I open it. P.S. If there's a better way to release the Builder, do let me know :)

    Read the article

  • Can I use Eclipse JDT to create new 'working copies' of source files in memory only?

    - by RYates
    I'm using Eclipse JDT to build a Java refactoring platform, for exploring different refactorings in memory before choosing one and saving it. I can create collections of working copies of the source files, edit them in memory, and commit the changes to disk using the JDT framework. However, I also want to generate new 'working copy' source files in memory as part of refactorings, and only create the corresponding real source file if I commit the working copy. I have seen various hints that this is possible, e.g. http://www.jarvana.com/jarvana/view/org/eclipse/jdt/doc/isv/3.3.0-v20070613/isv-3.3.0-v20070613.jar!/guide/jdt%5Fapi%5Fmanip.htm says "Note that the compilation unit does not need to exist in the Java model in order for a working copy to be created". So far I have only been able to create a new real file, i.e. ICompilationUnit newICompilationUnit = myPackage.createCompilationUnit(newName, "package piffle; public class Baz{private int i=0;}", false, null); This is not what I want. Does anyone know how to create a new 'working copy' source file, that does not appear in my file system until I commit it? Or any other mechanism to achieve the same thing?

    Read the article

< Previous Page | 290 291 292 293 294 295 296 297 298 299 300 301  | Next Page >