Search Results

Search found 5233 results on 210 pages for 'a records'.

Page 158/210 | < Previous Page | 154 155 156 157 158 159 160 161 162 163 164 165  | Next Page >

  • Caluculating sum of activity

    - by Maddy
    I have a table which is with following kind of information activity cost order date other information 10 1 100 -- 20 2 100 10 1 100 30 4 100 40 4 100 20 2 100 40 4 100 20 2 100 10 1 101 10 1 101 20 1 101 My requirement is to get sum of all activities over a work order ex: for order 100 1+2+4+4=11 1(for activity 10) 2(for activity 20) 4 (for activity 30) etc. i tried with group by, its taking lot time for calculation. There are 1lakh plus records in warehouse. is there any possibility in efficient way. SELECT SUM(MIN(cost)) FROM COST_WAREHOUSE a WHERE order = 100 GROUP BY (order, ACTIVITY)

    Read the article

  • SSIS code smell – Unused columns in the dataflow

    - by jamiet
    A code smell is defined on Wikipedia as being a “symptom in the source code of a program that possibly indicates a deeper problem”. It’s a term commonly used by our code-writing brethren to describe sub-optimal code but I think the term can be applied equally well to SSIS packages too as I shall now explain One of my pet hates about SSIS development is packages that throw warnings of the form: The output column "ColumnName" (1358) on output "OLE DB Source Output" (1289) and component "OLE_SRC Name" (1279) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.  The warning is fairly self-explanatory – any column that appears in the data flow but doesn’t get used will throw this warning when the data flow is executed. Its not the negligible performance degradation that they cause that bothers me though, it’s the clutter that they cause in your log file/table. Take a look at the following screenshot if you don’t believe me: There are 231409 such warnings in the system that I took this screenshot from, that is 231409 log records that should not be there. The most infuriating thing about this warning is that it is so easily avoidable; eliminating such columns is a very quick and easy thing to do in the SSIS Designer. The only problem I see is that the warnings don’t occur until you execute the package – it would be preferable for the designer to have an unobtrusive way of informing you of them as well. Anyway, I digress… I consider such warnings to be a code smell because, to me, they’re symptomatic of a lack of due care and attention; a lack of developer discipline if you will. What other code smells can you think of when building SSIS packages? If I get a good list in the comments maybe I’ll compile them into a later blog post. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Toorcon14

    - by danx
    Toorcon 2012 Information Security Conference San Diego, CA, http://www.toorcon.org/ Dan Anderson, October 2012 It's almost Halloween, and we all know what that means—yes, of course, it's time for another Toorcon Conference! Toorcon is an annual conference for people interested in computer security. This includes the whole range of hackers, computer hobbyists, professionals, security consultants, press, law enforcement, prosecutors, FBI, etc. We're at Toorcon 14—see earlier blogs for some of the previous Toorcon's I've attended (back to 2003). This year's "con" was held at the Westin on Broadway in downtown San Diego, California. The following are not necessarily my views—I'm just the messenger—although I could have misquoted or misparaphrased the speakers. Also, I only reviewed some of the talks, below, which I attended and interested me. MalAndroid—the Crux of Android Infections, Aditya K. Sood Programming Weird Machines with ELF Metadata, Rebecca "bx" Shapiro Privacy at the Handset: New FCC Rules?, Valkyrie Hacking Measured Boot and UEFI, Dan Griffin You Can't Buy Security: Building the Open Source InfoSec Program, Boris Sverdlik What Journalists Want: The Investigative Reporters' Perspective on Hacking, Dave Maas & Jason Leopold Accessibility and Security, Anna Shubina Stop Patching, for Stronger PCI Compliance, Adam Brand McAfee Secure & Trustmarks — a Hacker's Best Friend, Jay James & Shane MacDougall MalAndroid—the Crux of Android Infections Aditya K. Sood, IOActive, Michigan State PhD candidate Aditya talked about Android smartphone malware. There's a lot of old Android software out there—over 50% Gingerbread (2.3.x)—and most have unpatched vulnerabilities. Of 9 Android vulnerabilities, 8 have known exploits (such as the old Gingerbread Global Object Table exploit). Android protection includes sandboxing, security scanner, app permissions, and screened Android app market. The Android permission checker has fine-grain resource control, policy enforcement. Android static analysis also includes a static analysis app checker (bouncer), and a vulnerablity checker. What security problems does Android have? User-centric security, which depends on the user to grant permission and make smart decisions. But users don't care or think about malware (the're not aware, not paranoid). All they want is functionality, extensibility, mobility Android had no "proper" encryption before Android 3.0 No built-in protection against social engineering and web tricks Alternative Android app markets are unsafe. Simply visiting some markets can infect Android Aditya classified Android Malware types as: Type A—Apps. These interact with the Android app framework. For example, a fake Netflix app. Or Android Gold Dream (game), which uploads user files stealthy manner to a remote location. Type K—Kernel. Exploits underlying Linux libraries or kernel Type H—Hybrid. These use multiple layers (app framework, libraries, kernel). These are most commonly used by Android botnets, which are popular with Chinese botnet authors What are the threats from Android malware? These incude leak info (contacts), banking fraud, corporate network attacks, malware advertising, malware "Hackivism" (the promotion of social causes. For example, promiting specific leaders of the Tunisian or Iranian revolutions. Android malware is frequently "masquerated". That is, repackaged inside a legit app with malware. To avoid detection, the hidden malware is not unwrapped until runtime. The malware payload can be hidden in, for example, PNG files. Less common are Android bootkits—there's not many around. What they do is hijack the Android init framework—alteering system programs and daemons, then deletes itself. For example, the DKF Bootkit (China). Android App Problems: no code signing! all self-signed native code execution permission sandbox — all or none alternate market places no robust Android malware detection at network level delayed patch process Programming Weird Machines with ELF Metadata Rebecca "bx" Shapiro, Dartmouth College, NH https://github.com/bx/elf-bf-tools @bxsays on twitter Definitions. "ELF" is an executable file format used in linking and loading executables (on UNIX/Linux-class machines). "Weird machine" uses undocumented computation sources (I think of them as unintended virtual machines). Some examples of "weird machines" are those that: return to weird location, does SQL injection, corrupts the heap. Bx then talked about using ELF metadata as (an uintended) "weird machine". Some ELF background: A compiler takes source code and generates a ELF object file (hello.o). A static linker makes an ELF executable from the object file. A runtime linker and loader takes ELF executable and loads and relocates it in memory. The ELF file has symbols to relocate functions and variables. ELF has two relocation tables—one at link time and another one at loading time: .rela.dyn (link time) and .dynsym (dynamic table). GOT: Global Offset Table of addresses for dynamically-linked functions. PLT: Procedure Linkage Tables—works with GOT. The memory layout of a process (not the ELF file) is, in order: program (+ heap), dynamic libraries, libc, ld.so, stack (which includes the dynamic table loaded into memory) For ELF, the "weird machine" is found and exploited in the loader. ELF can be crafted for executing viruses, by tricking runtime into executing interpreted "code" in the ELF symbol table. One can inject parasitic "code" without modifying the actual ELF code portions. Think of the ELF symbol table as an "assembly language" interpreter. It has these elements: instructions: Add, move, jump if not 0 (jnz) Think of symbol table entries as "registers" symbol table value is "contents" immediate values are constants direct values are addresses (e.g., 0xdeadbeef) move instruction: is a relocation table entry add instruction: relocation table "addend" entry jnz instruction: takes multiple relocation table entries The ELF weird machine exploits the loader by relocating relocation table entries. The loader will go on forever until told to stop. It stores state on stack at "end" and uses IFUNC table entries (containing function pointer address). The ELF weird machine, called "Brainfu*k" (BF) has: 8 instructions: pointer inc, dec, inc indirect, dec indirect, jump forward, jump backward, print. Three registers - 3 registers Bx showed example BF source code that implemented a Turing machine printing "hello, world". More interesting was the next demo, where bx modified ping. Ping runs suid as root, but quickly drops privilege. BF modified the loader to disable the library function call dropping privilege, so it remained as root. Then BF modified the ping -t argument to execute the -t filename as root. It's best to show what this modified ping does with an example: $ whoami bx $ ping localhost -t backdoor.sh # executes backdoor $ whoami root $ The modified code increased from 285948 bytes to 290209 bytes. A BF tool compiles "executable" by modifying the symbol table in an existing ELF executable. The tool modifies .dynsym and .rela.dyn table, but not code or data. Privacy at the Handset: New FCC Rules? "Valkyrie" (Christie Dudley, Santa Clara Law JD candidate) Valkyrie talked about mobile handset privacy. Some background: Senator Franken (also a comedian) became alarmed about CarrierIQ, where the carriers track their customers. Franken asked the FCC to find out what obligations carriers think they have to protect privacy. The carriers' response was that they are doing just fine with self-regulation—no worries! Carriers need to collect data, such as missed calls, to maintain network quality. But carriers also sell data for marketing. Verizon sells customer data and enables this with a narrow privacy policy (only 1 month to opt out, with difficulties). The data sold is not individually identifiable and is aggregated. But Verizon recommends, as an aggregation workaround to "recollate" data to other databases to identify customers indirectly. The FCC has regulated telephone privacy since 1934 and mobile network privacy since 2007. Also, the carriers say mobile phone privacy is a FTC responsibility (not FCC). FTC is trying to improve mobile app privacy, but FTC has no authority over carrier / customer relationships. As a side note, Apple iPhones are unique as carriers have extra control over iPhones they don't have with other smartphones. As a result iPhones may be more regulated. Who are the consumer advocates? Everyone knows EFF, but EPIC (Electrnic Privacy Info Center), although more obsecure, is more relevant. What to do? Carriers must be accountable. Opt-in and opt-out at any time. Carriers need incentive to grant users control for those who want it, by holding them liable and responsible for breeches on their clock. Location information should be added current CPNI privacy protection, and require "Pen/trap" judicial order to obtain (and would still be a lower standard than 4th Amendment). Politics are on a pro-privacy swing now, with many senators and the Whitehouse. There will probably be new regulation soon, and enforcement will be a problem, but consumers will still have some benefit. Hacking Measured Boot and UEFI Dan Griffin, JWSecure, Inc., Seattle, @JWSdan Dan talked about hacking measured UEFI boot. First some terms: UEFI is a boot technology that is replacing BIOS (has whitelisting and blacklisting). UEFI protects devices against rootkits. TPM - hardware security device to store hashs and hardware-protected keys "secure boot" can control at firmware level what boot images can boot "measured boot" OS feature that tracks hashes (from BIOS, boot loader, krnel, early drivers). "remote attestation" allows remote validation and control based on policy on a remote attestation server. Microsoft pushing TPM (Windows 8 required), but Google is not. Intel TianoCore is the only open source for UEFI. Dan has Measured Boot Tool at http://mbt.codeplex.com/ with a demo where you can also view TPM data. TPM support already on enterprise-class machines. UEFI Weaknesses. UEFI toolkits are evolving rapidly, but UEFI has weaknesses: assume user is an ally trust TPM implicitly, and attached to computer hibernate file is unprotected (disk encryption protects against this) protection migrating from hardware to firmware delays in patching and whitelist updates will UEFI really be adopted by the mainstream (smartphone hardware support, bank support, apathetic consumer support) You Can't Buy Security: Building the Open Source InfoSec Program Boris Sverdlik, ISDPodcast.com co-host Boris talked about problems typical with current security audits. "IT Security" is an oxymoron—IT exists to enable buiness, uptime, utilization, reporting, but don't care about security—IT has conflict of interest. There's no Magic Bullet ("blinky box"), no one-size-fits-all solution (e.g., Intrusion Detection Systems (IDSs)). Regulations don't make you secure. The cloud is not secure (because of shared data and admin access). Defense and pen testing is not sexy. Auditors are not solution (security not a checklist)—what's needed is experience and adaptability—need soft skills. Step 1: First thing is to Google and learn the company end-to-end before you start. Get to know the management team (not IT team), meet as many people as you can. Don't use arbitrary values such as CISSP scores. Quantitive risk assessment is a myth (e.g. AV*EF-SLE). Learn different Business Units, legal/regulatory obligations, learn the business and where the money is made, verify company is protected from script kiddies (easy), learn sensitive information (IP, internal use only), and start with low-hanging fruit (customer service reps and social engineering). Step 2: Policies. Keep policies short and relevant. Generic SANS "security" boilerplate policies don't make sense and are not followed. Focus on acceptable use, data usage, communications, physical security. Step 3: Implementation: keep it simple stupid. Open source, although useful, is not free (implementation cost). Access controls with authentication & authorization for local and remote access. MS Windows has it, otherwise use OpenLDAP, OpenIAM, etc. Application security Everyone tries to reinvent the wheel—use existing static analysis tools. Review high-risk apps and major revisions. Don't run different risk level apps on same system. Assume host/client compromised and use app-level security control. Network security VLAN != segregated because there's too many workarounds. Use explicit firwall rules, active and passive network monitoring (snort is free), disallow end user access to production environment, have a proxy instead of direct Internet access. Also, SSL certificates are not good two-factor auth and SSL does not mean "safe." Operational Controls Have change, patch, asset, & vulnerability management (OSSI is free). For change management, always review code before pushing to production For logging, have centralized security logging for business-critical systems, separate security logging from administrative/IT logging, and lock down log (as it has everything). Monitor with OSSIM (open source). Use intrusion detection, but not just to fulfill a checkbox: build rules from a whitelist perspective (snort). OSSEC has 95% of what you need. Vulnerability management is a QA function when done right: OpenVas and Seccubus are free. Security awareness The reality is users will always click everything. Build real awareness, not compliance driven checkbox, and have it integrated into the culture. Pen test by crowd sourcing—test with logging COSSP http://www.cossp.org/ - Comprehensive Open Source Security Project What Journalists Want: The Investigative Reporters' Perspective on Hacking Dave Maas, San Diego CityBeat Jason Leopold, Truthout.org The difference between hackers and investigative journalists: For hackers, the motivation varies, but method is same, technological specialties. For investigative journalists, it's about one thing—The Story, and they need broad info-gathering skills. J-School in 60 Seconds: Generic formula: Person or issue of pubic interest, new info, or angle. Generic criteria: proximity, prominence, timeliness, human interest, oddity, or consequence. Media awareness of hackers and trends: journalists becoming extremely aware of hackers with congressional debates (privacy, data breaches), demand for data-mining Journalists, use of coding and web development for Journalists, and Journalists busted for hacking (Murdock). Info gathering by investigative journalists include Public records laws. Federal Freedom of Information Act (FOIA) is good, but slow. California Public Records Act is a lot stronger. FOIA takes forever because of foot-dragging—it helps to be specific. Often need to sue (especially FBI). CPRA is faster, and requests can be vague. Dumps and leaks (a la Wikileaks) Journalists want: leads, protecting ourselves, our sources, and adapting tools for news gathering (Google hacking). Anonomity is important to whistleblowers. They want no digital footprint left behind (e.g., email, web log). They don't trust encryption, want to feel safe and secure. Whistleblower laws are very weak—there's no upside for whistleblowers—they have to be very passionate to do it. Accessibility and Security or: How I Learned to Stop Worrying and Love the Halting Problem Anna Shubina, Dartmouth College Anna talked about how accessibility and security are related. Accessibility of digital content (not real world accessibility). mostly refers to blind users and screenreaders, for our purpose. Accessibility is about parsing documents, as are many security issues. "Rich" executable content causes accessibility to fail, and often causes security to fail. For example MS Word has executable format—it's not a document exchange format—more dangerous than PDF or HTML. Accessibility is often the first and maybe only sanity check with parsing. They have no choice because someone may want to read what you write. Google, for example, is very particular about web browser you use and are bad at supporting other browsers. Uses JavaScript instead of links, often requiring mouseover to display content. PDF is a security nightmare. Executible format, embedded flash, JavaScript, etc. 15 million lines of code. Google Chrome doesn't handle PDF correctly, causing several security bugs. PDF has an accessibility checker and PDF tagging, to help with accessibility. But no PDF checker checks for incorrect tags, untagged content, or validates lists or tables. None check executable content at all. The "Halting Problem" is: can one decide whether a program will ever stop? The answer, in general, is no (Rice's theorem). The same holds true for accessibility checkers. Language-theoretic Security says complicated data formats are hard to parse and cannot be solved due to the Halting Problem. W3C Web Accessibility Guidelines: "Perceivable, Operable, Understandable, Robust" Not much help though, except for "Robust", but here's some gems: * all information should be parsable (paraphrasing) * if not parsable, cannot be converted to alternate formats * maximize compatibility in new document formats Executible webpages are bad for security and accessibility. They say it's for a better web experience. But is it necessary to stuff web pages with JavaScript for a better experience? A good example is The Drudge Report—it has hand-written HTML with no JavaScript, yet drives a lot of web traffic due to good content. A bad example is Google News—hidden scrollbars, guessing user input. Solutions: Accessibility and security problems come from same source Expose "better user experience" myth Keep your corner of Internet parsable Remember "Halting Problem"—recognize false solutions (checking and verifying tools) Stop Patching, for Stronger PCI Compliance Adam Brand, protiviti @adamrbrand, http://www.picfun.com/ Adam talked about PCI compliance for retail sales. Take an example: for PCI compliance, 50% of Brian's time (a IT guy), 960 hours/year was spent patching POSs in 850 restaurants. Often applying some patches make no sense (like fixing a browser vulnerability on a server). "Scanner worship" is overuse of vulnerability scanners—it gives a warm and fuzzy and it's simple (red or green results—fix reds). Scanners give a false sense of security. In reality, breeches from missing patches are uncommon—more common problems are: default passwords, cleartext authentication, misconfiguration (firewall ports open). Patching Myths: Myth 1: install within 30 days of patch release (but PCI §6.1 allows a "risk-based approach" instead). Myth 2: vendor decides what's critical (also PCI §6.1). But §6.2 requires user ranking of vulnerabilities instead. Myth 3: scan and rescan until it passes. But PCI §11.2.1b says this applies only to high-risk vulnerabilities. Adam says good recommendations come from NIST 800-40. Instead use sane patching and focus on what's really important. From NIST 800-40: Proactive: Use a proactive vulnerability management process: use change control, configuration management, monitor file integrity. Monitor: start with NVD and other vulnerability alerts, not scanner results. Evaluate: public-facing system? workstation? internal server? (risk rank) Decide:on action and timeline Test: pre-test patches (stability, functionality, rollback) for change control Install: notify, change control, tickets McAfee Secure & Trustmarks — a Hacker's Best Friend Jay James, Shane MacDougall, Tactical Intelligence Inc., Canada "McAfee Secure Trustmark" is a website seal marketed by McAfee. A website gets this badge if they pass their remote scanning. The problem is a removal of trustmarks act as flags that you're vulnerable. Easy to view status change by viewing McAfee list on website or on Google. "Secure TrustGuard" is similar to McAfee. Jay and Shane wrote Perl scripts to gather sites from McAfee and search engines. If their certification image changes to a 1x1 pixel image, then they are longer certified. Their scripts take deltas of scans to see what changed daily. The bottom line is change in TrustGuard status is a flag for hackers to attack your site. Entire idea of seals is silly—you're raising a flag saying if you're vulnerable.

    Read the article

  • NHibernate Pitfalls: Fetch and Paging

    - by Ricardo Peres
    This is part of a series of posts about NHibernate Pitfalls. See the entire collection here. NHibernate allows you to force loading additional references (many to one, one to one) or collections (one to many, many to many) in a query. You must know, however, that this is incompatible with paging. It’s easy to see why. Let’s say you want to get 5 products starting on the fifth, you can issue the following LINQ query: 1: session.Query<Product>().Take(5).Skip(5).ToList(); Will product this SQL in SQL Server: 1: SELECT 2: TOP (@p0) product1_4_, 3: name4_, 4: price4_ 5: FROM 6: (select 7: product0_.product_id as product1_4_, 8: product0_.name as name4_, 9: product0_.price as price4_, 10: ROW_NUMBER() OVER( 11: ORDER BY 12: CURRENT_TIMESTAMP) as __hibernate_sort_row 13: from 14: product product0_) as query 15: WHERE 16: query.__hibernate_sort_row > @p1 17: ORDER BY If, however, you wanted to bring as well the associated order details, you might be tempted to try this: 1: session.Query<Product>().Fetch(x => x.OrderDetails).Take(5).Skip(5).ToList(); Which, in turn, will produce this SQL: 1: SELECT 2: TOP (@p0) product1_4_0_, 3: order1_3_1_, 4: name4_0_, 5: price4_0_, 6: order2_3_1_, 7: product3_3_1_, 8: quantity3_1_, 9: product3_0__, 10: order1_0__ 11: FROM 12: (select 13: product0_.product_id as product1_4_0_, 14: orderdetai1_.order_detail_id as order1_3_1_, 15: product0_.name as name4_0_, 16: product0_.price as price4_0_, 17: orderdetai1_.order_id as order2_3_1_, 18: orderdetai1_.product_id as product3_3_1_, 19: orderdetai1_.quantity as quantity3_1_, 20: orderdetai1_.product_id as product3_0__, 21: orderdetai1_.order_detail_id as order1_0__, 22: ROW_NUMBER() OVER( 23: ORDER BY 24: CURRENT_TIMESTAMP) as __hibernate_sort_row 25: from 26: product product0_ 27: left outer join 28: order_detail orderdetai1_ 29: on product0_.product_id=orderdetai1_.product_id 30: ) as query 31: WHERE 32: query.__hibernate_sort_row > @p1 33: ORDER BY 34: query.__hibernate_sort_row; However, because of the JOIN, what happens is that, if your products have more than one order details, you will get several records – one per order detail – per product, which means that pagination will be broken. There is an workaround, which forces you to write your LINQ query in another way: 1: session.Query<OrderDetail>().Where(x => session.Query<Product>().Select(y => y.ProductId).Take(5).Skip(5).Contains(x.Product.ProductId)).Select(x => x.Product).ToList() Or, using HQL: 1: session.CreateQuery("select od.Product from OrderDetail od where od.Product.ProductId in (select p.ProductId from Product p skip 5 take 5)").List<Product>(); The generated SQL will then be: 1: select 2: product1_.product_id as product1_4_, 3: product1_.name as name4_, 4: product1_.price as price4_ 5: from 6: order_detail orderdetai0_ 7: left outer join 8: product product1_ 9: on orderdetai0_.product_id=product1_.product_id 10: where 11: orderdetai0_.product_id in ( 12: SELECT 13: TOP (@p0) product_id 14: FROM 15: (select 16: product2_.product_id, 17: ROW_NUMBER() OVER( 18: ORDER BY 19: CURRENT_TIMESTAMP) as __hibernate_sort_row 20: from 21: product product2_) as query 22: WHERE 23: query.__hibernate_sort_row > @p1 24: ORDER BY 25: query.__hibernate_sort_row); Which will get you what you want: for 5 products, all of their order details.

    Read the article

  • Announcing Oracle Enterprise Content Management Suite 11g

    - by [email protected]
    Today Oracle announced Oracle Enterprise Content Management Suite 11g. This is a major release for us, and reinforces our three key themes at Oracle: Complete New in this release - Oracle ECM Suite 11g is built on a single, unified repository. Every piece of content - documents, HTML pages, digital assets, scanned images - is stored and accessbile directly from the repository, whether you are working on websites, creating brand logos, processing accounts payable invoices, or running records and retention functions. It makes complete, end-to-end management of content possible, from the point it enters the organization, through its entire lifecycle. Also new in this release, the installation, access, monitoring and administration of Oracle ECM Suite 11g is centralized. As a complete system, organizations can lower the costs of training and usage by having a centralized source of information that is easily administered. As part of this new unified repository release, Oracle has released a benchmarking white paper that shows the extreme performance and scalability of Oracle ECM Suite. When tested on a two node UCM Server running on Sun Oracle DB Machine Half Rack Hardware with an Exadata storage server, Oracle ECM Suite 11g is able to ingest over 178 million documents per day. Open Oracle ECM Suite 11g is built on a service-oriented architecture. All functions are available through standards-based services calls in Web Services or Java. In this release Oracle unveils Open Web Content Management. Open Web Content Management is a revolutionary approach to web content management that decouples the content management process from the process of creating web applications. One piece of this approach is our one-click web content management. With one click, a web application builder can drag content services into their application, enabling their users to also edit content with just one click. Open Web Content Management is also open because it enables Web developers to add Web content management to new and existing JavaServer Pages (JSP), JavaServer Faces (JSF) and Oracle Application Development Framework (ADF) Faces applications Open content distribution - Oracle ECM Suite 11g offers flexible deployment options with a built-in smart cache so organizations can deliver Web sites or Web applications without requiring Oracle ECM Suite as part of the delivery system Integrated Oracle ECM Suite 11g also offers a series of next generation desktop integrations, providing integrations such as: New MS Office integration with menus to access managed content, insert managed links, and compare managed documents using standard MS Office reviewing tools Automatic identity tagging of documents on download - to help users understand which versions they are viewing and prevent duplicate content items in the content repository. New "smart productivity folders" to show a users workflow inbox, saved searches and checked out content directly from Windows Explorer Drag and drop metadata pop-ups Check in and check out for all file formats with any standard WebDAV server As part of Oracle's Enterprise Application Documents initiative, Oracle Content Management 11g also provides certified application integrations with solution templates You can read the press release here. You can see more assets at the launch center here. You can sign up for the announcement webinar and hear more about the new features here. You can read the benchmarking study here.

    Read the article

  • Change Tracking

    - by Ricardo Peres
    You may recall my last post on Change Data Control. This time I am going to talk about other option for tracking changes to tables on SQL Server: Change Tracking. The main differences between the two are: Change Tracking works with SQL Server 2008 Express Change Tracking does not require SQL Server Agent to be running Change Tracking does not keep the old values in case of an UPDATE or DELETE Change Data Capture uses an asynchronous process, so there is no overhead on each operation Change Data Capture requires more storage and processing Here's some code that illustrates it's usage: -- for demonstrative purposes, table Post of database Blog only contains two columns, PostId and Title -- enable change tracking for database Blog, for 2 days ALTER DATABASE Blog SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON); -- enable change tracking for table Post ALTER TABLE Post ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON); -- see current records on table Post SELECT * FROM Post SELECT * FROM sys.sysobjects WHERE name = 'Post' SELECT * FROM sys.sysdatabases WHERE name = 'Blog' -- confirm that table Post and database Blog are being change tracked SELECT * FROM sys.change_tracking_tables SELECT * FROM sys.change_tracking_databases -- see current version for table Post SELECT p.PostId, p.Title, c.SYS_CHANGE_VERSION, c.SYS_CHANGE_CONTEXT FROM Post AS p CROSS APPLY CHANGETABLE(VERSION Post, (PostId), (p.PostId)) AS c; -- update post UPDATE Post SET Title = 'First Post Title Changed' WHERE Title = 'First Post Title'; -- see current version for table Post SELECT p.PostId, p.Title, c.SYS_CHANGE_VERSION, c.SYS_CHANGE_CONTEXT FROM Post AS p CROSS APPLY CHANGETABLE(VERSION Post, (PostId), (p.PostId)) AS c; -- see changes since version 0 (initial) SELECT p.Title, c.PostId, SYS_CHANGE_VERSION, SYS_CHANGE_OPERATION, SYS_CHANGE_COLUMNS, SYS_CHANGE_CONTEXT FROM CHANGETABLE(CHANGES Post, 0) AS c LEFT OUTER JOIN Post AS p ON p.PostId = c.PostId; -- is column Title of table Post changed since version 0? SELECT CHANGE_TRACKING_IS_COLUMN_IN_MASK(COLUMNPROPERTY(OBJECT_ID('Post'), 'Title', 'ColumnId'), (SELECT SYS_CHANGE_COLUMNS FROM CHANGETABLE(CHANGES Post, 0) AS c)) -- get current version SELECT CHANGE_TRACKING_CURRENT_VERSION() -- disable change tracking for table Post ALTER TABLE Post DISABLE CHANGE_TRACKING; -- disable change tracking for database Blog ALTER DATABASE Blog SET CHANGE_TRACKING = OFF; You can read about the differences between the two options here. Choose the one that best suits your needs! SyntaxHighlighter.config.clipboardSwf = 'http://alexgorbatchev.com/pub/sh/2.0.320/scripts/clipboard.swf'; SyntaxHighlighter.brushes.CSharp.aliases = ['c#', 'c-sharp', 'csharp']; SyntaxHighlighter.brushes.Xml.aliases = ['xml']; SyntaxHighlighter.all();

    Read the article

  • Advanced TSQL Tuning: Why Internals Knowledge Matters

    - by Paul White
    There is much more to query tuning than reducing logical reads and adding covering nonclustered indexes.  Query tuning is not complete as soon as the query returns results quickly in the development or test environments.  In production, your query will compete for memory, CPU, locks, I/O and other resources on the server.  Today’s entry looks at some tuning considerations that are often overlooked, and shows how deep internals knowledge can help you write better TSQL. As always, we’ll need some example data.  In fact, we are going to use three tables today, each of which is structured like this: Each table has 50,000 rows made up of an INTEGER id column and a padding column containing 3,999 characters in every row.  The only difference between the three tables is in the type of the padding column: the first table uses CHAR(3999), the second uses VARCHAR(MAX), and the third uses the deprecated TEXT type.  A script to create a database with the three tables and load the sample data follows: USE master; GO IF DB_ID('SortTest') IS NOT NULL DROP DATABASE SortTest; GO CREATE DATABASE SortTest COLLATE LATIN1_GENERAL_BIN; GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest', SIZE = 3GB, MAXSIZE = 3GB ); GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest_log', SIZE = 256MB, MAXSIZE = 1GB, FILEGROWTH = 128MB ); GO ALTER DATABASE SortTest SET ALLOW_SNAPSHOT_ISOLATION OFF ; ALTER DATABASE SortTest SET AUTO_CLOSE OFF ; ALTER DATABASE SortTest SET AUTO_CREATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_SHRINK OFF ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS_ASYNC ON ; ALTER DATABASE SortTest SET PARAMETERIZATION SIMPLE ; ALTER DATABASE SortTest SET READ_COMMITTED_SNAPSHOT OFF ; ALTER DATABASE SortTest SET MULTI_USER ; ALTER DATABASE SortTest SET RECOVERY SIMPLE ; USE SortTest; GO CREATE TABLE dbo.TestCHAR ( id INTEGER IDENTITY (1,1) NOT NULL, padding CHAR(3999) NOT NULL,   CONSTRAINT [PK dbo.TestCHAR (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestMAX ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL,   CONSTRAINT [PK dbo.TestMAX (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestTEXT ( id INTEGER IDENTITY (1,1) NOT NULL, padding TEXT NOT NULL,   CONSTRAINT [PK dbo.TestTEXT (id)] PRIMARY KEY CLUSTERED (id), ) ; -- ============= -- Load TestCHAR (about 3s) -- ============= INSERT INTO dbo.TestCHAR WITH (TABLOCKX) ( padding ) SELECT padding = REPLICATE(CHAR(65 + (Data.n % 26)), 3999) FROM ( SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1 FROM master.sys.columns C1, master.sys.columns C2, master.sys.columns C3 ORDER BY n ASC ) AS Data ORDER BY Data.n ASC ; -- ============ -- Load TestMAX (about 3s) -- ============ INSERT INTO dbo.TestMAX WITH (TABLOCKX) ( padding ) SELECT CONVERT(VARCHAR(MAX), padding) FROM dbo.TestCHAR ORDER BY id ; -- ============= -- Load TestTEXT (about 5s) -- ============= INSERT INTO dbo.TestTEXT WITH (TABLOCKX) ( padding ) SELECT CONVERT(TEXT, padding) FROM dbo.TestCHAR ORDER BY id ; -- ========== -- Space used -- ========== -- EXECUTE sys.sp_spaceused @objname = 'dbo.TestCHAR'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAX'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestTEXT'; ; CHECKPOINT ; That takes around 15 seconds to run, and shows the space allocated to each table in its output: To illustrate the points I want to make today, the example task we are going to set ourselves is to return a random set of 150 rows from each table.  The basic shape of the test query is the same for each of the three test tables: SELECT TOP (150) T.id, T.padding FROM dbo.Test AS T ORDER BY NEWID() OPTION (MAXDOP 1) ; Test 1 – CHAR(3999) Running the template query shown above using the TestCHAR table as the target, we find that the query takes around 5 seconds to return its results.  This seems slow, considering that the table only has 50,000 rows.  Working on the assumption that generating a GUID for each row is a CPU-intensive operation, we might try enabling parallelism to see if that speeds up the response time.  Running the query again (but without the MAXDOP 1 hint) on a machine with eight logical processors, the query now takes 10 seconds to execute – twice as long as when run serially. Rather than attempting further guesses at the cause of the slowness, let’s go back to serial execution and add some monitoring.  The script below monitors STATISTICS IO output and the amount of tempdb used by the test query.  We will also run a Profiler trace to capture any warnings generated during query execution. DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TC.id, TC.padding FROM dbo.TestCHAR AS TC ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; Let’s take a closer look at the statistics and query plan generated from this: Following the flow of the data from right to left, we see the expected 50,000 rows emerging from the Clustered Index Scan, with a total estimated size of around 191MB.  The Compute Scalar adds a column containing a random GUID (generated from the NEWID() function call) for each row.  With this extra column in place, the size of the data arriving at the Sort operator is estimated to be 192MB. Sort is a blocking operator – it has to examine all of the rows on its input before it can produce its first row of output (the last row received might sort first).  This characteristic means that Sort requires a memory grant – memory allocated for the query’s use by SQL Server just before execution starts.  In this case, the Sort is the only memory-consuming operator in the plan, so it has access to the full 243MB (248,696KB) of memory reserved by SQL Server for this query execution. Notice that the memory grant is significantly larger than the expected size of the data to be sorted.  SQL Server uses a number of techniques to speed up sorting, some of which sacrifice size for comparison speed.  Sorts typically require a very large number of comparisons, so this is usually a very effective optimization.  One of the drawbacks is that it is not possible to exactly predict the sort space needed, as it depends on the data itself.  SQL Server takes an educated guess based on data types, sizes, and the number of rows expected, but the algorithm is not perfect. In spite of the large memory grant, the Profiler trace shows a Sort Warning event (indicating that the sort ran out of memory), and the tempdb usage monitor shows that 195MB of tempdb space was used – all of that for system use.  The 195MB represents physical write activity on tempdb, because SQL Server strictly enforces memory grants – a query cannot ‘cheat’ and effectively gain extra memory by spilling to tempdb pages that reside in memory.  Anyway, the key point here is that it takes a while to write 195MB to disk, and this is the main reason that the query takes 5 seconds overall. If you are wondering why using parallelism made the problem worse, consider that eight threads of execution result in eight concurrent partial sorts, each receiving one eighth of the memory grant.  The eight sorts all spilled to tempdb, resulting in inefficiencies as the spilled sorts competed for disk resources.  More importantly, there are specific problems at the point where the eight partial results are combined, but I’ll cover that in a future post. CHAR(3999) Performance Summary: 5 seconds elapsed time 243MB memory grant 195MB tempdb usage 192MB estimated sort set 25,043 logical reads Sort Warning Test 2 – VARCHAR(MAX) We’ll now run exactly the same test (with the additional monitoring) on the table using a VARCHAR(MAX) padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TM.id, TM.padding FROM dbo.TestMAX AS TM ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query takes around 8 seconds to complete (3 seconds longer than Test 1).  Notice that the estimated row and data sizes are very slightly larger, and the overall memory grant has also increased very slightly to 245MB.  The most marked difference is in the amount of tempdb space used – this query wrote almost 391MB of sort run data to the physical tempdb file.  Don’t draw any general conclusions about VARCHAR(MAX) versus CHAR from this – I chose the length of the data specifically to expose this edge case.  In most cases, VARCHAR(MAX) performs very similarly to CHAR – I just wanted to make test 2 a bit more exciting. MAX Performance Summary: 8 seconds elapsed time 245MB memory grant 391MB tempdb usage 193MB estimated sort set 25,043 logical reads Sort warning Test 3 – TEXT The same test again, but using the deprecated TEXT data type for the padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TT.id, TT.padding FROM dbo.TestTEXT AS TT ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query runs in 500ms.  If you look at the metrics we have been checking so far, it’s not hard to understand why: TEXT Performance Summary: 0.5 seconds elapsed time 9MB memory grant 5MB tempdb usage 5MB estimated sort set 207 logical reads 596 LOB logical reads Sort warning SQL Server’s memory grant algorithm still underestimates the memory needed to perform the sorting operation, but the size of the data to sort is so much smaller (5MB versus 193MB previously) that the spilled sort doesn’t matter very much.  Why is the data size so much smaller?  The query still produces the correct results – including the large amount of data held in the padding column – so what magic is being performed here? TEXT versus MAX Storage The answer lies in how columns of the TEXT data type are stored.  By default, TEXT data is stored off-row in separate LOB pages – which explains why this is the first query we have seen that records LOB logical reads in its STATISTICS IO output.  You may recall from my last post that LOB data leaves an in-row pointer to the separate storage structure holding the LOB data. SQL Server can see that the full LOB value is not required by the query plan until results are returned, so instead of passing the full LOB value down the plan from the Clustered Index Scan, it passes the small in-row structure instead.  SQL Server estimates that each row coming from the scan will be 79 bytes long – 11 bytes for row overhead, 4 bytes for the integer id column, and 64 bytes for the LOB pointer (in fact the pointer is rather smaller – usually 16 bytes – but the details of that don’t really matter right now). OK, so this query is much more efficient because it is sorting a very much smaller data set – SQL Server delays retrieving the LOB data itself until after the Sort starts producing its 150 rows.  The question that normally arises at this point is: Why doesn’t SQL Server use the same trick when the padding column is defined as VARCHAR(MAX)? The answer is connected with the fact that if the actual size of the VARCHAR(MAX) data is 8000 bytes or less, it is usually stored in-row in exactly the same way as for a VARCHAR(8000) column – MAX data only moves off-row into LOB storage when it exceeds 8000 bytes.  The default behaviour of the TEXT type is to be stored off-row by default, unless the ‘text in row’ table option is set suitably and there is room on the page.  There is an analogous (but opposite) setting to control the storage of MAX data – the ‘large value types out of row’ table option.  By enabling this option for a table, MAX data will be stored off-row (in a LOB structure) instead of in-row.  SQL Server Books Online has good coverage of both options in the topic In Row Data. The MAXOOR Table The essential difference, then, is that MAX defaults to in-row storage, and TEXT defaults to off-row (LOB) storage.  You might be thinking that we could get the same benefits seen for the TEXT data type by storing the VARCHAR(MAX) values off row – so let’s look at that option now.  This script creates a fourth table, with the VARCHAR(MAX) data stored off-row in LOB pages: CREATE TABLE dbo.TestMAXOOR ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL,   CONSTRAINT [PK dbo.TestMAXOOR (id)] PRIMARY KEY CLUSTERED (id), ) ; EXECUTE sys.sp_tableoption @TableNamePattern = N'dbo.TestMAXOOR', @OptionName = 'large value types out of row', @OptionValue = 'true' ; SELECT large_value_types_out_of_row FROM sys.tables WHERE [schema_id] = SCHEMA_ID(N'dbo') AND name = N'TestMAXOOR' ; INSERT INTO dbo.TestMAXOOR WITH (TABLOCKX) ( padding ) SELECT SPACE(0) FROM dbo.TestCHAR ORDER BY id ; UPDATE TM WITH (TABLOCK) SET padding.WRITE (TC.padding, NULL, NULL) FROM dbo.TestMAXOOR AS TM JOIN dbo.TestCHAR AS TC ON TC.id = TM.id ; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAXOOR' ; CHECKPOINT ; Test 4 – MAXOOR We can now re-run our test on the MAXOOR (MAX out of row) table: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) MO.id, MO.padding FROM dbo.TestMAXOOR AS MO ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; TEXT Performance Summary: 0.3 seconds elapsed time 245MB memory grant 0MB tempdb usage 193MB estimated sort set 207 logical reads 446 LOB logical reads No sort warning The query runs very quickly – slightly faster than Test 3, and without spilling the sort to tempdb (there is no sort warning in the trace, and the monitoring query shows zero tempdb usage by this query).  SQL Server is passing the in-row pointer structure down the plan and only looking up the LOB value on the output side of the sort. The Hidden Problem There is still a huge problem with this query though – it requires a 245MB memory grant.  No wonder the sort doesn’t spill to tempdb now – 245MB is about 20 times more memory than this query actually requires to sort 50,000 records containing LOB data pointers.  Notice that the estimated row and data sizes in the plan are the same as in test 2 (where the MAX data was stored in-row). The optimizer assumes that MAX data is stored in-row, regardless of the sp_tableoption setting ‘large value types out of row’.  Why?  Because this option is dynamic – changing it does not immediately force all MAX data in the table in-row or off-row, only when data is added or actually changed.  SQL Server does not keep statistics to show how much MAX or TEXT data is currently in-row, and how much is stored in LOB pages.  This is an annoying limitation, and one which I hope will be addressed in a future version of the product. So why should we worry about this?  Excessive memory grants reduce concurrency and may result in queries waiting on the RESOURCE_SEMAPHORE wait type while they wait for memory they do not need.  245MB is an awful lot of memory, especially on 32-bit versions where memory grants cannot use AWE-mapped memory.  Even on a 64-bit server with plenty of memory, do you really want a single query to consume 0.25GB of memory unnecessarily?  That’s 32,000 8KB pages that might be put to much better use. The Solution The answer is not to use the TEXT data type for the padding column.  That solution happens to have better performance characteristics for this specific query, but it still results in a spilled sort, and it is hard to recommend the use of a data type which is scheduled for removal.  I hope it is clear to you that the fundamental problem here is that SQL Server sorts the whole set arriving at a Sort operator.  Clearly, it is not efficient to sort the whole table in memory just to return 150 rows in a random order. The TEXT example was more efficient because it dramatically reduced the size of the set that needed to be sorted.  We can do the same thing by selecting 150 unique keys from the table at random (sorting by NEWID() for example) and only then retrieving the large padding column values for just the 150 rows we need.  The following script implements that idea for all four tables: SET STATISTICS IO ON ; WITH TestTable AS ( SELECT * FROM dbo.TestCHAR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id = ANY (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAX ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestTEXT ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAXOOR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; All four queries now return results in much less than a second, with memory grants between 6 and 12MB, and without spilling to tempdb.  The small remaining inefficiency is in reading the id column values from the clustered primary key index.  As a clustered index, it contains all the in-row data at its leaf.  The CHAR and VARCHAR(MAX) tables store the padding column in-row, so id values are separated by a 3999-character column, plus row overhead.  The TEXT and MAXOOR tables store the padding values off-row, so id values in the clustered index leaf are separated by the much-smaller off-row pointer structure.  This difference is reflected in the number of logical page reads performed by the four queries: Table 'TestCHAR' logical reads 25511 lob logical reads 000 Table 'TestMAX'. logical reads 25511 lob logical reads 000 Table 'TestTEXT' logical reads 00412 lob logical reads 597 Table 'TestMAXOOR' logical reads 00413 lob logical reads 446 We can increase the density of the id values by creating a separate nonclustered index on the id column only.  This is the same key as the clustered index, of course, but the nonclustered index will not include the rest of the in-row column data. CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestCHAR (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAX (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestTEXT (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAXOOR (id); The four queries can now use the very dense nonclustered index to quickly scan the id values, sort them by NEWID(), select the 150 ids we want, and then look up the padding data.  The logical reads with the new indexes in place are: Table 'TestCHAR' logical reads 835 lob logical reads 0 Table 'TestMAX' logical reads 835 lob logical reads 0 Table 'TestTEXT' logical reads 686 lob logical reads 597 Table 'TestMAXOOR' logical reads 686 lob logical reads 448 With the new index, all four queries use the same query plan (click to enlarge): Performance Summary: 0.3 seconds elapsed time 6MB memory grant 0MB tempdb usage 1MB sort set 835 logical reads (CHAR, MAX) 686 logical reads (TEXT, MAXOOR) 597 LOB logical reads (TEXT) 448 LOB logical reads (MAXOOR) No sort warning I’ll leave it as an exercise for the reader to work out why trying to eliminate the Key Lookup by adding the padding column to the new nonclustered indexes would be a daft idea Conclusion This post is not about tuning queries that access columns containing big strings.  It isn’t about the internal differences between TEXT and MAX data types either.  It isn’t even about the cool use of UPDATE .WRITE used in the MAXOOR table load.  No, this post is about something else: Many developers might not have tuned our starting example query at all – 5 seconds isn’t that bad, and the original query plan looks reasonable at first glance.  Perhaps the NEWID() function would have been blamed for ‘just being slow’ – who knows.  5 seconds isn’t awful – unless your users expect sub-second responses – but using 250MB of memory and writing 200MB to tempdb certainly is!  If ten sessions ran that query at the same time in production that’s 2.5GB of memory usage and 2GB hitting tempdb.  Of course, not all queries can be rewritten to avoid large memory grants and sort spills using the key-lookup technique in this post, but that’s not the point either. The point of this post is that a basic understanding of execution plans is not enough.  Tuning for logical reads and adding covering indexes is not enough.  If you want to produce high-quality, scalable TSQL that won’t get you paged as soon as it hits production, you need a deep understanding of execution plans, and as much accurate, deep knowledge about SQL Server as you can lay your hands on.  The advanced database developer has a wide range of tools to use in writing queries that perform well in a range of circumstances. By the way, the examples in this post were written for SQL Server 2008.  They will run on 2005 and demonstrate the same principles, but you won’t get the same figures I did because 2005 had a rather nasty bug in the Top N Sort operator.  Fair warning: if you do decide to run the scripts on a 2005 instance (particularly the parallel query) do it before you head out for lunch… This post is dedicated to the people of Christchurch, New Zealand. © 2011 Paul White email: @[email protected] twitter: @SQL_Kiwi

    Read the article

  • Music before bells and whistles

    - by Tony Davis
    Why is it that Windows has so much difficulty in finding content on its file system? This is not an insurmountable technical problem; on my laptop, I have a database within which I can instantly find text or names within millions of records, within 300 milliseconds. I have a copy of Google Desktop that can find phrases within emails or documents, almost as quickly. It is an important, though mundane, part of an operating system to be able to find files. The first thing I notice within Windows is that the facility to find files or text within files is called 'search' rather than 'find'. Hmm. This doesn’t bode well. What’s this? It does a brute-force search for file names? Here we are in an age when we can breed mice that glow in the dark, and manufacture computers that fit in our shirt pockets, and we find an operating system that is still entirely innocent of managing and indexing content in hierarchical data. I can actually read the files of my PC into a database, mimic the directory/folder hierarchies and then find files in a flash; but when I do the same with Windows Vista, we are suddenly back in a 1960s time warp. Finding files based on their name is bad enough, but finding files based on the content that they contain is more or less asking for an opportunity to wait 20 minutes in order to see a "file not found" message. Sadly, with Windows 7, Microsoft seems to have fallen into the familiar trap of adding bells and whistles before finishing the song. It's certainly true that Microsoft has added new features and a certain polish to Windows Search 4.0, the latest incarnation. It works more like a web search and offers a new search syntax, called Advanced Query Syntax, which allows you to search on file author, file size, date ranges (e.g. date:=7/4/09still does not work reliably. I've experienced first-hand its stubborn refusal, despite a full index, to acknowledge the existence of a file I know exists, based on a search for a specific term within that file that I know is in there somewhere; a file that Google Desktop search, or old wingrep, finds in seconds. When users hark back to the halcyon days of Windows XP search, you know something is seriously amiss. Shouldn't applications get the functionality right before applying animated menus and Teletubby graphics, or is advancing age making me grumpy? I’d be pleased to hear your views, as always. Cheers, Tony.

    Read the article

  • Finally home - and something fully off topic

    - by Mike Dietrich
    Arrived at Munich Pasing last night at 0:50am ... finally :-) On Sunday I've left the Dylan Hotel in Dublin (thanks to the staff there as well: you were REALLY helpful!!) around 7:30pm to go to the port - and came home on Tuesday morning 1:15am. So all together 29:45hrs door-to-door - not bad for nearly 2000km just relying on public transport. And could have been faster if there were seats in ealier TGV's left. But I don't complain at all ;-) Just checked the website of Dublin Airport - it says currently: 17.00pm: Latest on flight disruptions at Dublin Airport The IAA have advised us that based on the latest Volcanic Ash Advisory Centre London Dublin Airport will remain closed for all inbound and outbound commercial flights until 20.00hours. This effectively means that no flights will land or take off at Dublin Airport until then. A further update will be posted this afternoon. When traveling I have always my iPod with me. It has gotten a bit old now (I think I've bought it 3 years ago in November 2007) but it has a 160GB hard disk in it so it fits most of my music collection (not the entire collection anymore as I'm currently re-riping everything to Apple Lossless because at least for my ears it makes a big difference - but I listen to good ol' vinyl as well ...and I don't download compressed music ;-) ). The battery of my little travel companion is still good for more than 20 hours consistent music playback - and there was a band from Texas being in my ears most of the whole journey called Midlake. I haven't heard of them before until I asked a lady at a Munich store some few weeks ago what she's playing on the speakers in the shop. She was amazed and came back with the CD cover but I hesitated to buy it as I always want to listen the tunes before - and at this day I had no time left to do so. But in Dublin I had a bit of spare time on Saturday and I always enter record stores - and the Tower Records was the sort of store I really enjoy and so I've spent there nearly two hours - leaving with 3 Midlake CDs in my bag. So if you are interested just listen those tunes which may remind some people on Fleetwood Mac: As I said in the title, fully off topic ;-)

    Read the article

  • Oracle Tutor: Are Documented Policies and Procedures Necessary?

    - by emily.chorba(at)oracle.com
    People refer to policies and procedures with a variety of expressions including business process documentation, standard operating procedures (SOPs), department operating procedures (DOPs), work instructions, specifications, and so on. For our purpose here, policies and procedures mean a set of documents that describe an organization's policies (rules) for operation and the procedures (containing tasks performed by individuals) to fulfill the policies. When an organization documents policies and procedures properly, they can be the strategic link between an organization's vision and its daily operations. Policies and procedures are often necessary because of some external requirement, such as environmental compliance or other governmental regulations. One example of an external requirement would be the American Sarbanes-Oxley Act, requiring full openness in accounting practices. Here are a few other examples of business issues that necessitate writing policies and procedures: Operational needs -- policies and procedures ensure fundamental processes are performed in a consistent way that meets the organization's needs. Risk management -- policies and procedures are identified by the Committee of Sponsoring Organizations of the Treadway Commission (COSO) as a control activity needed to manage risk. Continuous improvement -- Procedures can improve processes by building important internal communication practices. Compliance -- Well-defined and documented processes (i.e. procedures, training materials) along with records that demonstrate process capability can demonstrate an effective internal control system compliant with regulations and standards. In addition to helping with the above business issues, policies and procedures can support the basic needs of employees and management. Well documented and easy to access policies and procedures: allow employees to understand their roles and responsibilities within predefined limits and to stay on the accepted path indentified by the organization's management provide clarity to the reader when dealing with accountability issues or activities that are of critical importance allow management to guide operations without constant intervention allow managers to control events in advance and prevent employees from making costly mistakes Can you think of another way organizations can meet the above needs of management and their employees in place of documented Policies and Procedures? Probably not, but we would love your feedback on this question. And that my friends, is why documented policies and procedures are very necessary. Learn MoreFor more information about Tutor, visit Oracle.com or the Tutor Blog. Post your questions at the Tutor Forum. Emily ChorbaPrinciple Product Manager Oracle Tutor & BPM

    Read the article

  • SQL SERVER – CSVExpress and Quick Data Load

    - by pinaldave
    One of the newest ETL tools is CSVexpress.com.  This is a program that can quickly load any CSV file into ODBC compliant databases uses data integration.  For those of you familiar with databases and how they operate, the question that comes to mind might be what use this program will have in your life. I have written earlier article on this subject over here SQL SERVER – Import CSV into Database – Transferring File Content into a Database Table using CSVexpress. You might know that RDBMS have automatic support for loading CSV files into tables – but it is not quite as easy as one click of a button.  First of all, most databases have a command line interface and you need the file and configuration script in order to load up.  You also need to know enough to write the script – which for novices can be extremely daunting.  On top of all this, if you work with more than one type of RDBMS, you need to know the ins and outs of uploading and writing script for more than one program. So you might begin to see how useful CSVexpress.com might be!  There are many other tools that enable uploading files to a database.  They can be very fancy – some can generate configuration files automatically, others load the data directly.  Again, novices will be able to tell you why these aren’t the most useful programs in the world.  You see, these programs were created with SQL in mind, not for uploading data.  If you don’t have large amounts of data to upload, getting the configurations right can be a long process and you will have to check the code that is generated yourself.  Not exactly “easy to use” for novices. That makes CSVexpress.com one of the best new tools available for everyone – but especially people who don’t want to learn a lot of new material all at once.  CSVexpress has an easy to navigate graphical user interface and no scripting or coding is required.  There are built-in constraints and data validations, and you can configure transforms and reject records right there on the screen.  But the best thing of all – it’s free! That’s right, you can download CSVexpress for free from www.csvexpress.com and start easily uploading and configuring riles almost immediately.  If you’re currently happy with your method of data configuration, keep up with the good work.  For the rest of us, there’s CSVexpress.com. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology

    Read the article

  • Performance problems loading XML with SSIS, an alternative way!

    - by AtulThakor
    I recently needed to load several thousand XML files into a SQL database, I created an SSIS package which was created as followed: Using a foreach container to loop through a directory and load each file path into a variable, the “Import XML” dataflow would then load each XML file into a SQL table.       Running this, it took approximately 1 second to load each file which seemed a massive amount of time to parse the XML and load the data, speaking to my colleague Martin Croft, he suggested the use of T-SQL Bulk Insert and OpenRowset, so we adjusted the package as followed:     The same foreach container was used but instead the following SQL command was executed (this is an expression):     "INSERT INTO MyTable(FileDate) SELECT   CAST(bulkcolumn AS XML)     FROM OPENROWSET(         BULK         '" + @[User::CurrentFile]  + "',         SINGLE_BLOB ) AS x"     Using this method we managed to load approximately 20 records per second, much faster…for data loading! For what we wanted to achieve this was perfect but I’ll leave you with the following points when making your own decision on which solution you decide to choose!      Openrowset Method Much faster to get the data into SQL You’ll need to parse or create a view over the XML data to allow the data to be more usable(another post on this!) Not able to apply validation/transformation against the data when loading it The SQL Server service account will need permission to the file No schema validation when loading files SSIS Slower (in our case) Schema validation Allows you to apply transformations/joins to the data Permissions should be less of a problem Data can be loaded into the final form through the package When using a schema validation errors can fail the package (I’ll do another post on this)

    Read the article

  • Vote of Disconfidence to Entity Framework

    - by Ricardo Peres
    A friend of mine has found the following problem with Entity Framework 4: Two simple classes and one association between them (one to many): One condition to filter out soft-deleted entities (WHERE Deleted = 0): 100 records in the database; A simple query: 1: var l = ctx.Person.Include("Address").Where(x => (x.Address.Name == "317 Oak Blvd." && x.Address.Number == 926) || (x.Address.Name == "891 White Milton Drive" && x.Address.Number == 497)); Will produce the following SQL: 1: SELECT 2: [Extent1].[Id] AS [Id], 3: [Extent1].[FullName] AS [FullName], 4: [Extent1].[AddressId] AS [AddressId], 5: [Extent202].[Id] AS [Id1], 6: [Extent202].[Name] AS [Name], 7: [Extent202].[Number] AS [Number] 8: FROM [dbo].[Person] AS [Extent1] 9: LEFT OUTER JOIN [dbo].[Address] AS [Extent2] ON ([Extent2].[Deleted] = 0) AND ([Extent1].[AddressId] = [Extent2].[Id]) 10: LEFT OUTER JOIN [dbo].[Address] AS [Extent3] ON ([Extent3].[Deleted] = 0) AND ([Extent1].[AddressId] = [Extent3].[Id]) 11: LEFT OUTER JOIN [dbo].[Address] AS [Extent4] ON ([Extent4].[Deleted] = 0) AND ([Extent1].[AddressId] = [Extent4].[Id]) 12: LEFT OUTER JOIN [dbo].[Address] AS [Extent5] ON ([Extent5].[Deleted] = 0) AND ([Extent1].[AddressId] = [Extent5].[Id]) 13: LEFT OUTER JOIN [dbo].[Address] AS [Extent6] ON ([Extent6].[Deleted] = 0) AND ([Extent1].[AddressId] = [Extent6].[Id]) 14: ... 15: WHERE ((N'317 Oak Blvd.' = [Extent2].[Name]) AND (926 = [Extent3].[Number])) 16: ... And will result in 680 MB of memory being taken! Now, Entity Framework has been historically known for producing less than optimal SQL, but 680 MB for 100 entities?! According to Microsoft, the problem will be addressed in the following version, there is a Connect issue open. There is even a whitepaper, Performance Considerations for Entity Framework 5, which talks about some of the changes and optimizations coming on version 5, but by reading it, I got even more concerned: “Once the cache contains a set number of entries (800), we start a timer that periodically (once-per-minute) sweeps the cache.” Say what?! The next version of Entity Framework will spawn timer threads?! When Code First came along, I thought it was a step in the right direction. Sure, it didn’t include some things that NHibernate did for quite some time – for example, different strategies for Id generation that do not rely on IDENTITY columns, which makes INSERT batching impossible, or support for enumerated types – but I thought these would come with the time. Now, enumerated types have, but so did… timer threads! I’m afraid Entity Framework is becoming a monster.

    Read the article

  • Cool Enhancements Everyone Can Enjoy

    - by Ruth
    With Release 17, we have a few visual and functional enhancements that make using CRM On Demand that much better for us all. I'll mention a few here, but to get the full outline of these upgrades, I recommend taking 10 minutes to view the Release 17 Usability Transfer of Information course. First and foremost, I find the ability to customize your theme (or skin) pretty cool, but I've said that before. Take a look at the Selecting Your Theme and the Themes - Create Your CRM Style blog articles for more information. My next favorite is the resizeable user interface (UI). CRM On Demand will dynamically fit the device and screen resolution you're using, which includes the resizing of fields, field editors and pop-ups. If you have a wide screen like me, you should appreciate that one very much. To make it easier to see that resized UI, the detail pages got a little face lift. New horizontal lines and other subtle changes make those pages easier to read. Also, those things you need to know, like error messages and inline help are highlighted with a little icon to show the message type. You may not think every change to the detail pages are particularly exciting, but I'm sure you'll enjoy the new Head Up Display, which saves you scrolling time by adding links to related information sections. I like that the head up display travels with me as I move up and down the page...it's like a little friend that takes me where I want to go as fast as possible. You may also really like the fact that the copy record feature is now available for all record types from both detail pages and lists. Your company administrator can choose which fields get copied, so you can maximize your efficiency when creating new records. Lists also got a face lift. Alternating colors in rows make it easier to see your data. Also, the Favorite Lists icon is now on the list itself, so you can save your most useful lists with one click. If you've ever tried to create a new list with 10 columns or more, you'll be happy to hear that the maximum number of columns in a list has increased from 9 to 20. This is great news, but doesn't mean you should include the kitchen sink in your list...excess columns can slow list performance. So choose your columns wisely. Again, these are just a few of my favorite things. Let us know what you think about the new usability features. What are your favorite things?

    Read the article

  • How to Quickly Add Multiple IP Addresses to Windows Servers

    - by Sysadmin Geek
    If you have ever added multiple IP addresses to a single Windows server, going through the graphical interface is an incredible pain as each IP must be added manually, each in a new dialog box. Here’s a simple solution. Needless to say, this can be incredibly monotonous and time consuming if you are adding more than a few IP addresses. Thankfully, there is a much easier way which allows you to add an entire subnet (or more) in seconds. Adding an IP Address from the Command Line Windows includes the “netsh” command which allows you to configure just about any aspect of your network connections. If you view the accepted parameters using “netsh /?” you will be presented with a list of commands each which have their own list of commands (and so on). For the purpose of adding IP addresses, we are interested in this string of parameters: netsh interface ipv4 add address Note: For Windows Server 2003/XP and earlier, “ipv4″ should be replaced with just “ip” in the netsh command. If you view the help information, you can see the full list of accepted parameters but for the most part what you will be interested in is something like this: netsh interface ipv4 add address “Local Area Connection” 192.168.1.2 255.255.255.0 The above command adds the IP Address 192.168.1.2 (with Subnet Mask 255.255.255.0) to the connection titled “Local Area Network”. Adding Multiple IP Addresses at Once When we accompany a netsh command with the FOR /L loop, we can quickly add multiple IP addresses. The syntax for the FOR /L loop looks like this: FOR /L %variable IN (start,step,end) DO command So we could easily add every IP address from an entire subnet using this command: FOR /L %A IN (0,1,255) DO netsh interface ipv4 add address “Local Area Connection” 192.168.1.%A 255.255.255.0 This command takes about 20 seconds to run, where adding the same number of IP addresses manually would take significantly longer. A Quick Demonstration Here is the initial configuration on our network adapter: ipconfig /all Now run netsh from within a FOR /L loop to add IP’s 192.168.1.10-20 to this adapter: FOR /L %A IN (10,1,20) DO netsh interface ipv4 add address “Local Area Connection” 192.168.1.%A 255.255.255.0 After the above command is run, viewing the IP Configuration of the adapter now shows: Latest Features How-To Geek ETC How To Create Your Own Custom ASCII Art from Any Image How To Process Camera Raw Without Paying for Adobe Photoshop How Do You Block Annoying Text Message (SMS) Spam? How to Use and Master the Notoriously Difficult Pen Tool in Photoshop HTG Explains: What Are the Differences Between All Those Audio Formats? How To Use Layer Masks and Vector Masks to Remove Complex Backgrounds in Photoshop Bring Summer Back to Your Desktop with the LandscapeTheme for Chrome and Iron The Prospector – Home Dash Extension Creates a Whole New Browsing Experience in Firefox KinEmote Links Kinect to Windows Why Nobody Reads Web Site Privacy Policies [Infographic] Asian Temple in the Snow Wallpaper 10 Weird Gaming Records from the Guinness Book

    Read the article

  • MySQL and Hadoop Integration - Unlocking New Insight

    - by Mat Keep
    “Big Data” offers the potential for organizations to revolutionize their operations. With the volume of business data doubling every 1.2 years, analysts and business users are discovering very real benefits when integrating and analyzing data from multiple sources, enabling deeper insight into their customers, partners, and business processes. As the world’s most popular open source database, and the most deployed database in the web and cloud, MySQL is a key component of many big data platforms, with Hadoop vendors estimating 80% of deployments are integrated with MySQL. The new Guide to MySQL and Hadoop presents the tools enabling integration between the two data platforms, supporting the data lifecycle from acquisition and organisation to analysis and visualisation / decision, as shown in the figure below The Guide details each of these stages and the technologies supporting them: Acquire: Through new NoSQL APIs, MySQL is able to ingest high volume, high velocity data, without sacrificing ACID guarantees, thereby ensuring data quality. Real-time analytics can also be run against newly acquired data, enabling immediate business insight, before data is loaded into Hadoop. In addition, sensitive data can be pre-processed, for example healthcare or financial services records can be anonymized, before transfer to Hadoop. Organize: Data is transferred from MySQL tables to Hadoop using Apache Sqoop. With the MySQL Binlog (Binary Log) API, users can also invoke real-time change data capture processes to stream updates to HDFS. Analyze: Multi-structured data ingested from multiple sources is consolidated and processed within the Hadoop platform. Decide: The results of the analysis are loaded back to MySQL via Apache Sqoop where they inform real-time operational processes or provide source data for BI analytics tools. So how are companies taking advantage of this today? As an example, on-line retailers can use big data from their web properties to better understand site visitors’ activities, such as paths through the site, pages viewed, and comments posted. This knowledge can be combined with user profiles and purchasing history to gain a better understanding of customers, and the delivery of highly targeted offers. Of course, it is not just in the web that big data can make a difference. Every business activity can benefit, with other common use cases including: - Sentiment analysis; - Marketing campaign analysis; - Customer churn modeling; - Fraud detection; - Research and Development; - Risk Modeling; - And more. As the guide discusses, Big Data is promising a significant transformation of the way organizations leverage data to run their businesses. MySQL can be seamlessly integrated within a Big Data lifecycle, enabling the unification of multi-structured data into common data platforms, taking advantage of all new data sources and yielding more insight than was ever previously imaginable. Download the guide to MySQL and Hadoop integration to learn more. I'd also be interested in hearing about how you are integrating MySQL with Hadoop today, and your requirements for the future, so please use the comments on this blog to share your insights.

    Read the article

  • How valuable are you to your organization?

    - by Lance Shaw
    I don't know about you but I find it easy to get bogged down with the daily list of tasks and deliverables.  We all have lots to do and it all seems to be due tomorrow.  If you are reading this blog, than your to-do list is almost certainly filled with tasks related to the management, processing and publishing of information.  As we get mired in the daily routine of making sure that the content management needs of the organizations are met, we can easily lose sight of the value that we bring.  After all, if information and content is the lifeblood of our organizations, then surely maintaining the healthy flow of that information has real value.  But how can you measure that value and bring it forward on your résumé or your list of achievements in time for your next performance review? The AIIM organization has spent a lot of time recently researching the value of certification for "information professionals".  When it comes to enterprise content management (ECM) there are many areas of specialization including records management, content archivist, digital asset manager, content librarian and more.  Specialization can clearly drive up your value but it can also lock you into a narrow niche area of focus.  AIIM has found that what companies also need is someone that can apply their knowledge of how information is managed within the operational scope of the business in order to drive real, measurable strategic value.  When you can showcase the value of a broader, business-wide mindset to your management, you have more opportunity to make professional progress and drive real growth where it counts, your paycheck.   We here on the Oracle WebCenter team partnered with AIIM on the research they performed around the value of an information professional certification program. In a webinar this week, Doug Miles of AIIM and I will be talking about the results of that recent survey and what it is going to mean in the future to be recognized as a "Certified Information Professional" (CIP).  Oracle sponsored this research to help individuals and companies understand the value of enterprise content management and what it means across the entire organization. I hope you will join us. If any of us were stopped in the street and were asked about it, I bet most of us would think of ourselves as an "Information Professional".  Now we have a way to actually prove it!  There's only one downside that I can see...  you will have to get your business cards updated to include the "CIP" acronym after your name.  I think you will agree that is a price worth paying!

    Read the article

  • Tips on ensuring Model Quality

    - by [email protected]
    Given enough data that represents well the domain and models that reflect exactly the decision being optimized, models usually provide good predictions that ensure lift. Nevertheless, sometimes the modeling situation is less than ideal. In this blog entry we explore the problems found in a few such situations and how to avoid them.1 - The Model does not reflect the problem you are trying to solveFor example, you may be trying to solve the problem: "What product should I recommend to this customer" but your model learns on the problem: "Given that a customer has acquired our products, what is the likelihood for each product". In this case the model you built may be too far of a proxy for the problem you are really trying to solve. What you could do in this case is try to build a model based on the result from recommendations of products to customers. If there is not enough data from actual recommendations, you could use a hybrid approach in which you would use the [bad] proxy model until the recommendation model converges.2 - Data is not predictive enoughIf the inputs are not correlated with the output then the models may be unable to provide good predictions. For example, if the input is the phase of the moon and the weather and the output is what car did the customer buy, there may be no correlations found. In this case you should see a low quality model.The solution in this case is to include more relevant inputs.3 - Not enough cases seenIf the data learned does not include enough cases, at least 200 positive examples for each output, then the quality of recommendations may be low. The obvious solution is to include more data records. If this is not possible, then it may be possible to build a model based on the characteristics of the output choices rather than the choices themselves. For example, instead of using products as output, use the product category, price and brand name, and then combine these models.4 - Output leaking into input giving the false impression of good quality modelsIf the input data in the training includes values that have changed or are available only because the output happened, then you will find some strong correlations between the input and the output, but these strong correlations do not reflect the data that you will have available at decision (prediction) time. For example, if you are building a model to predict whether a web site visitor will succeed in registering, and the input includes the variable DaysSinceRegistration, and you learn when this variable has already been set, you will probably see a big correlation between having a Zero (or one) in this variable and the fact that registration was successful.The solution is to remove these variables from the input or make sure they reflect the value as of the time of decision and not after the result is known. 

    Read the article

  • Tips On Using The Service Contracts Import Program

    - by LuciaC
    Prior to release 12.1 there was no supported way to import contracts into the EBS Service Contracts application - there were no public APIs nor contract load programs provided.  From release 12.1 onwards the 'Service Contracts Import Program' is provided to load service contracts into the application. The Service Contracts Import functionality is explained in How to Use the Service Contracts Import Program - Scope and Limitations (Doc ID 1057242.1).  This note includes an attached document which explains the program architecture, shows the Entity Relationship Diagram and details the interface table definitions. The Import program takes data from the interface tables listed below and populates the contracts schema tables:  OKS_USAGE_COUNTERS_INTERFACE OKS_SALES_CREDITS_INTERFACEOKS_NOTES_INTERFACEOKS_LINES_INTERFACEOKS_HEADERS_INTERFACEOKS_COVERED_LEVELS_INTERFACEThese interface tables must be loaded via a custom load program.The Service Contracts Import concurrent request is then submitted to create contracts from this legacy data. The parameters to run the Import program are:  Parameter Description  Mode Validate only, Import  Batch Number Batch_Id (unique id populated into the OKS_HEADERS_INTERFACE table)  Number of Workers Number of workers required (these are spawned as separate sub-requests)  Commit size Represents number of successfully processed contracts commited to database The program spawns sub-requests for the import worker(s) and the 'Service Contracts Import Report'.  The data is validated prior to import and into the Contracts tables and will report errors in the Service Contracts Import Report program output file (Import Execution Report).  Troubleshooting tips are provided in R12.1 - Common Service Contract Import Errors (Doc ID 762545.1); this document lists some, but not all, import errors.  The document will be updated over time.  Additional help is given in Debugging Tip for Service Contracts Import Errors (Doc ID 971426.1).After you successfully import contracts, you can purge the records from the interface tables by running the Service Contracts Import Purge concurrent program. Note that there is no supported way to mass delete data from the Contracts schema tables once they are populated, so data loaded by the Import program must be fully tested and verified before the program is run to load data into a Production system.A Service Contracts Import Test program has been provided which will take an existing contract in the application and load the interface tables using the data from that contract.  This can be used as an example for guidance on how to load the interface tables.  The Test program functionality is explained in How to Use the Service Contracts Test Import Program Provided in Release 12.1 (Doc ID 761209.1).  Note that the Test program has some limitations which do not apply to the full Import program and is not a supported program, it is simply a testing tool.  

    Read the article

  • Working with Reporting Services Filters – Part 3: The TOP and BOTTOM Operators

    - by smisner
    Thus far in this series, I have described using the IN operator and the LIKE operator. Today, I’ll continue the series by reviewing the TOP and BOTTOM operators. Today, I happened to be working on an example of using the TOP N operator and was not successful on my first try because the behavior is just a bit different than we find when using an “equals” comparison as I described in my first post in this series. In my example, I wanted to display a list of the top 5 resellers in the United States for AdventureWorks, but I wanted it based on a filter. I started with a hard-coded filter like this: Expression Data Type Operator Value [ResellerSalesAmount] Float Top N 5 And received the following error: A filter value in the filter for tablix 'Tablix1' specifies a data type that is not supported by the 'TopN' operator. Verify that the data type for each filter value is Integer. Well, that puzzled me. Did I really have to convert ResellerSalesAmount to an integer to use the Top N operator? Just for kicks, I switched to the Top % operator like this: Expression Data Type Operator Value [ResellerSalesAmount] Float Top % 50 This time, I got exactly the results I expected – I had a total of 10 records in my dataset results, so 50% of that should yield 5 rows in my tablix. So thinking about the problem with Top N some  more, I switched the Value to an expression, like this: Expression Data Type Operator Value [ResellerSalesAmount] Float Top N =5 And it worked! So the value for Top N or Top % must reflect a number to plug into the calculation, such as Top 5 or Top 50%, and the expression is the basis for determining what’s in that group. In other words, Reporting Services will sort the rows by the expression – ResellerSalesAmount in this case – in descending order, and then filter out everything except the topmost rows based on the operator you specify. The curious thing is that, if you’re going to hard-code the value, you must enter the value for Top N with an equal sign in front of the integer, but you can omit the equal sign when entering a hard-coded value for Top %. This experience is why working with Reporting Services filters is not always intuitive! When you use a report parameter to set the value, you won’t have this problem. Just be sure that the data type of the report parameter is set to Integer. Jessica Moss has an example of using a Top N filter in a tablix which you can view here. Working with Bottom N and Bottom % works similarly. You just provide a number for N or for the percentage and Reporting Services works from the bottom up to determine which rows are kept and which are excluded.

    Read the article

  • How to store Role Based Access rights in web application?

    - by JonH
    Currently working on a web based CRM type system that deals with various Modules such as Companies, Contacts, Projects, Sub Projects, etc. A typical CRM type system (asp.net web form, C#, SQL Server backend). We plan to implement role based security so that basically a user can have one or more roles. Roles would be broken down by first the module type such as: -Company -Contact And then by the actions for that module for instance each module would end up with a table such as this: Role1 Example: Module Create Edit Delete View Company Yes Owner Only No Yes Contact Yes Yes Yes Yes In the above case Role1 has two module types (Company, and Contact). For company, the person assigned to this role can create companies, can view companies, can only edit records he/she created and cannot delete. For this same role for the module contact this user can create contacts, edit contacts, delete contacts, and view contacts (full rights basically). I am wondering is it best upon coming into the system to session the user's role with something like a: List<Role> roles; Where the Role class would have some sort of List<Module> modules; (can contain Company, Contact, etc.).? Something to the effect of: class Role{ string name; string desc; List<Module> modules; } And the module action class would have a set of actions (Create, Edit, Delete, etc.) for each module: class ModuleActions{ List<Action> actions; } And the action has a value of whether the user can perform the right: class Action{ string right; } Just a rough idea, I know the action could be an enum and the ModuleAction can probably be eliminated with a List<x, y>. My main question is what would be the best way to store this information in this type of application: Should I store it in the User Session state (I have a session class where I manage things related to the user). I generally load this during the initial loading of the application (global.asax). I can simply tack onto this session. Or should this be loaded at the page load event of each module (page load of company etc..). I eventually need to be able to hide / unhide various buttons / divs based on the user's role and that is what got me thinking to load this via session. Any examples or points would be great.

    Read the article

  • Silverlight Cream for December 08, 2010 -- #1005

    - by Dave Campbell
    In this Issue: Peter Kuhn, David Anson, Jesse Liberty, Mike Taulty(-2-, -3-), Kunal Chowdhury, Jeremy Likness, Martin Krüger, Beth Massi(-2-, -3-)/ Above the Fold: Silverlight: "Rebuilding the PDC 2010 Silverlight Application (Part 1)" Mike Taulty WP7: "WP7: Glossy text block custom control" Martin Krüger Lightswitch: "How to Create a Screen with Multiple Search Parameters in LightSwitch" Beth Massi From SilverlightCream.com: Requirements of and pitfalls in Windows Phone 7 serialization Peter Kuhn discusses Data Contract Serializer issuses on WP7 and how to work around them. Managed implementation of CRC32 and MD5 algorithms updated; new release of ComputeFileHashes for Silverlight, WPF, and the command-line! David Anson ties up some loose ends from a prior post on hash functions, and updates his CRC32 and MD5 algorithms. Windows Phone From Scratch #9 – Visual State Jesse Liberty's latest Windows Phone from Scratch tutorial up... and is on the Visual State... he extends a Button and codes up the State Transitions. Rebuilding the PDC 2010 Silverlight Application (Part 1) Mike Taulty has taken the time to rebuild the PDC2010 Silverlight App that folks wanted the source for... and he's taking multiple posts to explain the heck out of it. This first one is mostly infrastructure. Rebuilding the PDC 2010 Silverlight Application (Part 2) In the 2nd post of the series, Mike Taulty is handling the In/Out of Browser business because he eventually is going to be going OOB. Rebuilding the PDC 2010 Silverlight Application (Part 3) Part 3 finds Mike Taulty delving into WCF Data Services and getting some data on the screen. Paginating Records in Silverlight DataGrid using PagedCollectionView Kunal Chowdhury continues with his investigation of the PagedCollectionView with this post on Pagination of your data. Old School Silverlight Effects If you haven't seen Jeremy Likness' 'Old School' Effects page yet, go just for the entertainment... you'll find yourself hanging around for the code :) WP7: Glossy text block custom control Martin Krüger's latest post is a very cool custom control for WP7 that displays Glossy text... it ain't Metro, but it looks pretty nice... some of it almost like etched text. How to Create a Screen with Multiple Search Parameters in LightSwitch Looks like Beth Massi got a few Lightswitch posts in while I wasn't looking. First up is this one on a multiple-parameter search screen. Adding Static Images and Logos to LightSwitch Applications In the 2nd post, Beth Massi shows how to add your own static images and logos to Lightswitch apps... in response to reader questions. Getting the Most out of LightSwitch Summary Properties In her latest post, Beth Massi explains what Summary Properties are in Lightswitch and how to use them to get the best results for your users. Stay in the 'Light! Twitter SilverlightNews | Twitter WynApse | WynApse.com | Tagged Posts | SilverlightCream Join me @ SilverlightCream | Phoenix Silverlight User Group Technorati Tags: Silverlight    Silverlight 3    Silverlight 4    Windows Phone MIX10

    Read the article

  • links for 2011-01-04

    - by Bob Rhubart
    Webcasts (tags: ping.fm) Five Key Trends in Enterprise 2.0 for 2011 (Oracle Enterprise 2.0 Blog) Kellsey Ruppel shares insight from Oracle's Andy MacMillan. (tags: oracle otn enterprise2.0) Victor Bax: Lost in Service Oriented Architecture? "SOA is a concept, no more, no less. SOA is not a technology, or a piece of software. It is an architecture, a model." - Victor Bax (tags: oracle soa) Jan-Leendert: Oracle 11g SOA Suite read multi record data from csv file with the file adapter (master-detail) "The file adapter is a very powerlful tool to read files with structured data. Most of the time you will read simple csv files with one record per row. But what if your csv file contains multiple records with different types?" - Jan-Leendert (tags: oracle soa soasuite) @myfear: Five ways to know how your data looked in the past. Entity Auditing. "Whatever requirements you have. I can promise you, that it will never be a simple solution. In general it's best to evaluate your purpose for auditing in detail." - Oracle ACE Director Markus Eisele (tags: oracle otn oracleace java) @fteter: Buffing Up The Crystal Ball "While I'm already tired of seeing these types of posts (I'm writing on New Year's Day), I'm also feeling guilty about not making my own set of predictions." - Oracle ACE Director Floyd Teter (tags: oracle otn oracleace ec2 cloud fusionmiddleware) @bex: ECM New Year's Resolutions "Happy new year! Most people use the first post of the year to go over their own blog statistics of popular posts... but since my blog's fiscal year ends in April, I decided to do new years resolutions instead." - Oracle ACE Director Bex Huff (tags: oracle otn oracleace ecm enterprise2.0) Izaak de Hullu: Embedded Java in a 11g BPEL process "In an earlier blog my colleague Peter Ebell explained how you can create an extension of com.collaxa.cube.engine.ext.BPELXExecLet to do your coding in a regular Java environment so you have code completion and validation..." - Izaak de Hullu (tags: oracle otn bpel java soa) @gschmutz: Cannot access EM console after installing SOA Suite 11g PS2 Oracle ACE Director Guido Schmutz encounters a problem and shares the solution. (tags: oracle otn oracleace soa soasuite)

    Read the article

  • Mixing It Up with BluesMix

    - by Oracle OpenWorld Blog Team
    By Karen Shamban At home base in London prior to making a swing on the US west coast later this month, BluesMix took a few minutes to answer some musical questions. Q: What are the top three things people should know about your music? A: We focus on original material and blend funk with blues. We're big on songwriting but also performance, groove, and feel of the music. It's music you can dance to! We're from London, England and have been labeled 'one of the UK's leading blues/funk bands'. Oh - that's four things! :) Q: Do you prefer smaller, intimate venues or larger, louder ones? A: Actually both, for different reasons. We play many intimate club shows in London at prestigious venues such as the 100 Club. There's lots of musical history with these types of clubs where the likes of the Rolling Stones used to play week-in week-out in the '60s. Usually these shows generally have a fantastic atmosphere, with a close connection to the audience, who are packed close to the stage. They often turn up surprises too…for example, we've had artists such as Amy Winehouse and Mick Abrahams in the crowd enjoying the show and then asking to come onstage and play with the band. Lots of fun! The larger venues are great too, in a different way. We've played to 3,000-person+ crowds and the atmosphere with so many people enjoying the show is a real buzz. It's also nice to play outdoor venues, especially in places with nice weather like California! Q: What's new and different in the music you are playing today, versus a year or two ago? A: Well, we released a new album earlier this year. It's called Flat Nine; it's on the Proper Records label. Whilst our music has always been a blend of blues and vintage funk, this album in particular has evolved our funk side even further. We've received some really great reviews from the music press in the UK and had generous comparisons to the likes of The Meters, Dr. John, The Average White Band, Howlin' Wolf. The album has generated lots of interest, which is fantastic. We're playing to regular sellout shows in the UK and are also opening for some legends of the funk music scene, such as The New Mastersounds. BluesMix are headlining the Oracle OpenWorld Welcome Reception in Yerba Buena Gardens on Sunday, September 30 and are playing at the Oracle OpenWorld Music Festival at Slim's on Tuesday, October 2. More on the music: Oracle OpenWorld Music Festival BluesMix  >>

    Read the article

  • Spotlight on RIVA: CRM integration for Oracle CRM on Demand and Microsoft Exchange

    - by Richard Lefebvre
    Introducing Riva from Omni - an Oracle ISV partner specializing in Enterprise Management and Integration Solutions Riva delivers advanced, server-side integration for Oracle CRM On Demand and Microsoft Exchange or even Novell GroupWise. Riva allows Oracle customers to go beyond the standard Outlook plug-in to deliver additional value for the end user as they interact between Outlook and CRM On Demand. Riva syncs CRM On Demand to ALL Exchange mail apps, not just Windows Outlook.  So, whether customers are using Outlook 2010, Outlook Web Access (web client), Outlook 2011 for Mac, Apple Mail, Outlook on Citrix  or a mobile device, Riva's got them covered. There are no plug-ins to be installed, configured, managed and maintained on users' desktops, laptops as Riva delivers Server-side synchronisation for CRMOD and Exchange. The automation of CRM and Outlook integration will remove the reliance upon users to synchronise between the two with Riva handling this process. Riva allows administrators to define sync policies and apply them to individuals or groups of users depending on their sync requirements. Administrators will be able to determine and manage the exposure of the most pertinent detail to be synchronised between Outlook and CRM On Demand. Custom and organic contact filtering for large deployments i.e. Based on ownership, groupings and contact frequency, filters can be applied on what contact records are shared with the users. Riva provides the capability to synchronise CRM and Outlook beyond Contacts, Calendar entries and Email. The synchronisation can be extended to cater for  opportunities, quotes and custom objects for example within the Outlook interface. Riva SmartConvert Folders can automate the creation of opportunities and associated contacts for example if they don't already exist. This can facilitate a reduction in manual detail entry through quick association whilst also benefiting user adoption. From a mobile perspective, Riva allows users to view and manage their CRM On Demand contacts, calendar, tasks, opportunities and cases from iPad, iPhone, Android and BlackBerry devices.  Again, there are no mobile apps or additional plugins to install, configure or manage. We sync CRM On Demand to Exchange.  Because the mobile device is connected to an Exchange mailbox, the information automatically syncs down to the native address book, calendar and mail apps on the smartphone or tablet. Riva Datasheet for CRM On Demand Riva Brochure – Oracle CRM On Demand  Technical Knowledgebase & Riva Trial  http://kb.omni-ts.com/47/ Comparison to Outlook Plug-ins Riva Diagram – Riva Comparison with Outlook Plug-ins Contact: Wolfgang Berger - [email protected]

    Read the article

< Previous Page | 154 155 156 157 158 159 160 161 162 163 164 165  | Next Page >