Search Results

Search found 71852 results on 2875 pages for 'data load'.

Page 13/2875 | < Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >

ADO.NET (WCF) Data Services Query Interceptor Hangs IIS

- by PreMagination

I have an ADO.NET Data Service that's supposed to provide read-only access to a somewhat complex database. Logically I have table-per-type (TPT) inheritance in my data model but the EDM doesn't implement inheritance. (Limitation of EF and navigation properties on derived types. STILL not fixed in EF4!) I can query my EDM directly (using a separate project) using a copy of the query I'm trying to run against the web service, results are returned within 10 seconds. Disabling the query interceptors I'm able to make the same query against the web service, results are returned similarly quickly. I can enable some of the query interceptors and the results are returned slowly, up to a minute or so later. Alternatively, I can enable all the query interceptors, expand less of the properties on the main object I'm querying, and results are returned in a similar period of time. (I've increased some of the timeout periods) Up til this point Sql Profiler indicates the slow-down is the database. (That's a post for a different day) But when I enable all my query interceptors and expand all the properties I'd like to have the IIS worker process pegs the CPU for 20 minutes and a query is never even made against the database. This implies to me that yes, my implementation probably sucks but regardless the Data Services "tier" is having an issue it shouldn't. WCF tracing didn't reveal anything interesting to my untrained eye. Details: Data model: Agent-Person-Student Student has a collection of referrals Students and referrals are private, queries against the web service should only return "your" students and referrals. This means Person and Agent need to be filtered too. Other entities (Agent-Organization-School) can be accessed by anyone who has authenticated. The existing security model is poorly suited to perform this type of filtering for this type of data access, the query interceptors are complicated and cause EF to generate some entertaining sql queries. Sample Interceptor [QueryInterceptor("Agents")] public Expression<Func<Agent, Boolean>> OnQueryAgents() { //Agent is a Person(1), Educator(2), Student(3), or Other Person(13); allow if scope permissions exist return ag => (ag.AgentType.AgentTypeId == 1 || ag.AgentType.AgentTypeId == 2 || ag.AgentType.AgentTypeId == 3 || ag.AgentType.AgentTypeId == 13) && ag.Person.OrganizationPersons.Count<OrganizationPerson>(op => op.Organization.ScopePermissions.Any<ScopePermission> (p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124) || op.Organization.HierarchyDescendents.Any<OrganizationsHierarchy>(oh => oh.AncestorOrganization.ScopePermissions.Any<ScopePermission> (p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124))) > 0; } The query interceptors for Person, Student, Referral are all very similar, ie they traverse multiple same/similar tables to look for ScopePermissions as above. Sample Query var referrals = (from r in service.Referrals .Expand("Organization/ParentOrganization") .Expand("Educator/Person/Agent") .Expand("Student/Person/Agent") .Expand("Student") .Expand("Grade") .Expand("ProblemBehavior") .Expand("Location") .Expand("Motivation") .Expand("AdminDecision") .Expand("OthersInvolved") where r.DateCreated >= coupledays && r.DateDeleted == null select r); Any suggestions or tips would be greatly associated, for fixing my current implementation or in developing a new one, with the caveat that the database can't be changed and that ultimately I need to expose a large portion of the database via a web service that limits data access to the data authorized for, for the purpose of data integration with multiple outside parties. THANK YOU!!!

Read the article
Oracle Insurance Gets Innovative with Insurance Business Intelligence

- by nicole.bruns(at)oracle.com

Oracle Insurance announced yesterday the availability of Oracle Insurance Insight 7.0, an insurance-specific data warehouse and business intelligence (BI) system that transforms the traditional approach to BI by involving business users in the creation and maintenance."Rapid access to business intelligence is essential to compete and thrive in today's insurance industry," said Srini Venkatasantham, vice president, Product Strategy, Oracle Insurance. "The adaptive data modeling approach of Oracle Insurance Insight 7.0, combined with the insurance-specific data model, offers global insurance companies a faster, easier way to get the intelligence they need to make better-informed business decisions." New Features in Oracle Insurance 7.0 include:"Adaptive Data Modeling" via the new warehouse palette: Gives business users the power to configure lines of business via an easy-to-use warehouse palette tool. Oracle Insurance Insight then automatically creates data warehouse elements - such as line-specific database structures and extract-transform-load (ETL) processes -speeding up time-to-value for BI initiatives. Out-of-the-box insurance models or create-from-scratch option: Includes pre-built content and interfaces for six Property and Casualty (P&C) lines. Additionally, insurers can use the warehouse palette to deploy any and all P&C or General Insurance lines of business from scratch, helping insurers support operations in any country.Leverages Oracle technologies: In addition to Oracle Business Intelligence Enterprise Edition, the solution includes Oracle Database 11g as well as Oracle Data Integrator Enterprise Edition 11g, which delivers Extract, Load and Transform (E-L-T) architecture and eliminates the need for a separate transformation server. Additionally, the expanded Oracle technology infrastructure enables support for Oracle Exadata. Martina Conlon, a Principal with Novarica's Insurance practice, and author of Business Intelligence in Insurance: Current State, Challenges, and Expectations says, "The need for continued investment by insurers in business intelligence capabilities is widely understood, and the industry is acting. Arming the business intelligence implementation with predefined insurance specific content, and flexible and configurable technology will get these projects up and running faster."Learn moreTo see a demo of the Oracle Insurance Insight system, click hereTo read the press announcement, click here

Read the article
make-like build tools for data?

- by miku

Make is a standard tools for building software. But make decides whether a target needs to be regenerated by comparing file modification times. Are there any proven, preferably small tools that handle builds not for software but for data? Something that regenerates targets not only on mod times but on certain other properties (e.g. completeness). (Or alternatively some paper that describes such a tool.) As illustration: I'd like to automate the following process: get data (e.g. a tarball) from some regularly updated source copy somewhere if it's not there (based e.g. on some filename-scheme) convert the files to different format (but only if there aren't successfully converted ones there - e.g. from a previous attempt - custom comparison routine) for each file find a certain data element and fetch some additional file from say an URL, but only if that hasn't been downloaded yet (decide on existence of file and file "freshness") finally compute something (e.g. word count for something identifiable and store it in the database, but only if the DB does not have an entry for that exact ID yet) Observations: there are different stages each stage is usually simple to compute or implement in isolation each stage may be simple, but the data volume may be large each stage may produce a few errors each stage may have different signals, on when (re)processing is needed Requirements: builds should be interruptable and idempotent (== robust) when interrupted, already processed objects should be reused to speedup the next run data paths should be easy to adjust (simple syntax, nothing new to learn, internal dsl would be ok) some form of dependency graph, that describes the process would be nice for later visualizations should leverage existing programs, if possible I've done some research on make alternatives like rake and have worked a lot with ant and maven in the past. All these tools naturally focus on code and software build, not on data builds. A system we have in place now for a task similar to the above is pretty much just shell scripts, which are compact (and are a ok glue for a variety of other programs written in other languages), so I wonder if worse is better?

Read the article
SQL SERVER – Installing Data Quality Services (DQS) on SQL Server 2012

- by pinaldave

Data Quality Services is very interesting enhancements in SQL Server 2012. My friend and SQL Server Expert Govind Kanshi have written an excellent article on this subject earlier on his blog. Yesterday I stumbled upon his blog one more time and decided to experiment myself with DQS. I have basic understanding of DQS and MDS so I knew I need to start with DQS Client. However, when I tried to find DQS Client I was not able to find it under SQL Server 2012 installation. I quickly realized that I needed to separately install the DQS client. You will find the DQS installer under SQL Server 2012 >> Data Quality Services directory. The pre-requisite of DQS is Master Data Services (MDS) and IIS. If you have not installed IIS, you can follow the simple steps and install IIS in your machine. Once the pre-requisites are installed, click on MDS installer once again and it will install DQS just fine. Be patient with the installer as it can take a bit longer time if your machine is low on configurations. Once the installation is over you will be able to expand SQL Server 2012 >> Data Quality Services directory and you will notice that it will have a new item called Data Quality Client. Click on it and it will open the client. Well, in future blog post we will go over more details about DQS and detailed practical examples. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology Tagged: Data Quality Services

Read the article
Social Analytics in your current data

- by Dan McGrath

By now everyone is aware of the massive boom in social-networking (Twitter, Facebook, LinkedIn) and obviously a big part of its business model revolves around being able to mine this data to create information that can be used to make money for someone. Gartner has identified 'Social Analytics' as one of the top 10 strategic technologies for 2011. Has anyone looked at their existing data structures to determine if they could extract a social graph and then perform further data mining against this? How does it fit in with your other strategic development strategies? What information are you trying to extract from the data? Take for example, a bank. They could conceivably determine a social graph through account relationships and transactions. Obviously there would be open edges on the graph where funds enter/leave the institute, but that shouldn't detract from the usefulness of the data. I'm looking for actual examples with the answers, as well as why/how they did it. References to other sites will be greatly appreciated. Note: I'm not at all referring to mining data out of actual social networks.

Read the article
The Oldest Big Data Problem: Parsing Human Language

- by dan.mcclary

There's a new whitepaper up on Oracle Technology Network which details the use of Digital Reasoning Systems' Synthesys software on Oracle Big Data Appliance. Digital Reasoning's approach is inherently "big data friendly," as it leverages multiple components of the Hadoop ecosystem. Moreover, the paper addresses the oldest big data problem of them all: extracting knowledge from human text. You can find the paper here. From the Executive Summary: There is a wealth of information to be extracted from natural language, but that extraction is challenging. The volume of human language we generate constitutes a natural Big Data problem, while its complexity and nuance requires a particular expertise to model and mine. In this paper we illustrate the impressive combination of Oracle Big Data Appliance and Digital Reasoning Synthesys software. The combination of Synthesys and Big Data Appliance makes it possible to analyze tens of millions of documents in a matter of hours. Moreover, this powerful combination achieves four times greater throughput than conducting the equivalent analysis on a much larger cloud-deployed Hadoop cluster.

Read the article
Uses of persistent data structures in non-functional languages

- by Ray Toal

Languages that are purely functional or near-purely functional benefit from persistent data structures because they are immutable and fit well with the stateless style of functional programming. But from time to time we see libraries of persistent data structures for (state-based, OOP) languages like Java. A claim often heard in favor of persistent data structures is that because they are immutable, they are thread-safe. However, the reason that persistent data structures are thread-safe is that if one thread were to "add" an element to a persistent collection, the operation returns a new collection like the original but with the element added. Other threads therefore see the original collection. The two collections share a lot of internal state, of course -- that's why these persistent structures are efficient. But since different threads see different states of data, it would seem that persistent data structures are not in themselves sufficient to handle scenarios where one thread makes a change that is visible to other threads. For this, it seems we must use devices such as atoms, references, software transactional memory, or even classic locks and synchronization mechanisms. Why then, is the immutability of PDSs touted as something beneficial for "thread safety"? Are there any real examples where PDSs help in synchronization, or solving concurrency problems? Or are PDSs simply a way to provide a stateless interface to an object in support of a functional programming style?

Read the article
Translate report data export from RUEI into HTML for import into OpenOffice Calc Spreadsheets

- by [email protected]

A common question of users is, How to import the data from the automated data export of Real User Experience Insight (RUEI) into tools for archiving, dashboarding or combination with other sets of data.XML is well-suited for such a translation via the companion Extensible Stylesheet Language Transformations (XSLT). Basically XSLT utilizes XSL, a template on what to read from your input XML data file and where to place it into the target document. The target document can be anything you like, i.e. XHTML, CSV, or even a OpenOffice Spreadsheet, etc. as long as it is a plain text format.XML 2 OpenOffice.org SpreadsheetFor the XSLT to work as an OpenOffice.org Calc Import Filter:How to add an XML Import Filter to OpenOffice CalcStart OpenOffice.org Calc andselect Tools > XML Filter SettingsNew...Fill in the details as follows:Filter name: RUEI Import filterApplication: OpenOffice.org Calc (.ods)Name of file type: Oracle Real User Experience InsightFile extension: xmlSwitch to the transformation tab and enter/select the following leaving the rest untouchedXSLT for import: ruei_report_data_import_filter.xslPlease see at the end of this blog post for a download of the referenced file.Select RUEI Import filter from list and Test XSLTClick on Browse to selectTransform file: export.php.xmlOpenOffice.org Calc will transform and load the XML file you retrieved from RUEI in a human-readable format.You can now select File > Open... and change the filetype to open your RUEI exports directly in OpenOffice.org Calc, just like any other a native Spreadsheet format.Files of type: Oracle Real User Experience Insight (*.xml)File name: export.php.xml XML 2 XHTMLMost XML-powered browsers provides for inherent XSL Transformation capabilities, you only have to reference the XSLT Stylesheet in the head of your XML file. Then open the file in your favourite Web Browser, Firefox, Opera, Safari or Internet Explorer alike.<?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href="ruei_report_data_export_2_xhtml.xsl"?><report>You can find a patched example export from RUEI plus the above referenced XSL-Stylesheets here: export.php.xml - Example report data export from RUEI ruei_report_data_export_2_xhtml.xsl - RUEI to XHTML XSL Transformation Stylesheetruei_report_data_import_filter.xsl - OpenOffice.org XML import filter for RUEI report export data If you would like to do things like this on the command line you can use either Xalan or xsltproc.The basic command syntax for xsltproc is very simple:xsltproc -o output.file stylesheet.xslt inputfile.xmlYou can use this with the above two stylesheets to translate RUEI Data Exports into XHTML and/or OpenOffice.org Calc ODS-Format. Or you could write your own XSLT to transform into Comma separated Value lists.Please let me know what you think or do with this information in the comments below.Kind regards,Stefan ThiemeReferences used:OpenOffice XML Filter - Create XSLT filters for import and export - http://user.services.openoffice.org/en/forum/viewtopic.php?f=45&t=3490SUN OpenOffice.org XML File Format 1.0 - http://xml.openoffice.org/xml_specification.pdf

Read the article
Data Auditor by Example

- by Jinjin.Wang

OWB has a node Data Auditors under Oracle Module in Projects Navigator. What is data auditor and how to use it? I will give an introduction to data auditor and show its usage by examples. Data auditor is an important tool in ensuring that data quality levels meet business requirements. Data auditor validates data against a set of data rules to determine which records comply and which do not. It gathers statistical metrics on how well the data in a system complies with a rule by auditing and marking how many errors are occurring against the audited table. Data auditors are typically scheduled for regular execution as part of a process flow, to monitor the quality of the data in an operational environment such as a data warehouse or ERP system, either immediately after updates like data loads, or at regular intervals. How to use data auditor to monitor data quality? Only objects with data rules can be monitored, so the first step is to define data rules according to business requirements and apply them to the objects you want to monitor. The objects can be tables, views, materialized views, and external tables. Secondly create a data auditor containing the objects. You can configure the data auditor and set physical deployment parameters for it as optional, which will be used while running the data auditor. Then deploy and run the data auditor either manually or as part of the process flow. After execution, the data auditor sets several output values, and records that are identified as not complying with the defined data rules contained in the data auditor are written to error tables. Here is an example. We have two tables DEPARTMENTS and EMPLOYEES (see pic-1 and pic-2. Click here for DDL and data) imported into OWB. We want to gather statistical metrics on how well data in these two tables satisfies the following requirements: a. Values of the EMPLOYEES.EMPLOYEE_ID attribute are three-digit numbers. b. Valid values for EMPLOYEES.JOB_ID are IT_PROG, SA_REP, SH_CLERK, PU_CLERK, and ST_CLERK. c. EMPLOYEES.EMPLOYEE_ID is related to DEPARTMENTS.MANAGER_ID. Pic-1 EMPLOYEES Pic-2 DEPARTMENTS 1. To determine legal data within EMPLOYEES or legal relationships between data in different columns of the two tables, firstly we define data rules based on the three requirements and apply them to tables. a. The first requirement is about patterns that an attribute is allowed to conform to. We create a Domain Pattern List data rule EMPLOYEE_PATTERN_RULE here. The pattern is defined in the Oracle Database regular expression syntax as ^([0-9]{3})$ Apply data rule EMPLOYEE_PATTERN_RULE to table EMPLOYEES.

Read the article
Failover or load balancing configuration on Apple Xserve and Mac OS X Server Snow Leopard (10.6)

- by Alasdair Allan

I'm currently installing a number of Apple Xserve boxes running Mac OS X Server (10.6). After poking and prodding for a while I can't find any obvious way, or documentation telling me how, to set up the 2 ethernet ports in a failover or load-balancing configuration. There seems to be an obvious way to do link aggregation, but not failover or load-balancing? What's the default configuration if I enable both ports with the same (manually configured) IP address?

Read the article
Tools for load-testing HTTP servers?

- by David Wolever

I've had to load test HTTP servers/web applications a few times, and each time I've been underwhelmed by the quality of tools I've been able to find. So, when you're load testing a HTTP server, what tools do you use? And what are the things I'll most likely do wrong the next time I've got to do it?

Read the article
Tomcat - Spring Framework - MySQL, Load balancing design

- by Mat Banik

Lets say I have Spring Framework App that is running on Tomcat and uses MySQL database: What would be the best solution in this instance that would allow for sociability (price/performance/integration time)? More precisely: What would be on the Web Load balancer box, and who should be the tomcat web server clustered? What would be on the Database Load balancer box and how should be the database servers clustered? And if at all possible specific technical integration links would be of great help.

Read the article
Investigate high load on RHEL

- by Adam Matan

One of my RHEL 5 was showing high load (~4-5) on uptime. The load increased, and when it reached 6 (±), the server froze and needed restart. top-wise, the server had no significant CPU or memory issues and sar showed no increase in iowait. Therefore, the thrashing must have been related to other factors. Any ideas how to investigate this? In particular, how do I know that which processes are waiting in the queue?

Read the article
Server load average over an arbitrary period of time

- by ChrisRamakers

top and uptime get me the server load average for the last 1, 5 and 15 minutes but is there an easy way to get the average load of yesterday or any other day for that matter?

Read the article
Disk IO causing high load on Xen/CentOS guest

- by Peter Lindqvist

I'm having serious issues with a xen based server, this is on the guest partition. It's a paravirtualized CentOS 5.5. The following numbers are taken from top while copying a large file over the network. If i copy the file another time the speed decreases in relation to load average. So the second time it's half the speed of the first time. It needs some time to cool off after this. Load average slowly decreases until it's once again usable. ls / takes about 30 seconds. top - 13:26:44 up 13 days, 21:44, 2 users, load average: 7.03, 5.08, 3.15 Tasks: 134 total, 2 running, 132 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 25.3%id, 74.5%wa, 0.0%hi, 0.0%si, 0.1%st Mem: 1048752k total, 1041460k used, 7292k free, 3116k buffers Swap: 2129912k total, 40k used, 2129872k free, 904740k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1506 root 10 -5 0 0 0 S 0.3 0.0 0:03.94 cifsd 1 root 15 0 2172 644 556 S 0.0 0.1 0:00.08 init Meanwhile the host is ~0.5 load avg and steady over time. ~50% wait Server hardware is dual xeon, 3gb ram, 170gb scsi 320 10k rpm, and shouldn't have any problems with copying files over the network. disk = [ "tap:aio:/vm/dev01.img,xvda,w" ] I also get these in the log INFO: task syslogd:1350 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syslogd D 00062E4F 2208 1350 1 1353 1312 (NOTLB) c0ef0ed0 00000286 6e71a411 00062e4f c0ef0f18 00000009 c0f20000 6e738bfd 00062e4f 0001e7ec c0f2010c c181a724 c1abd200 00000000 ffffffff c0ef0ecc c041a180 00000000 c0ef0ed8 c03d6a50 00000000 00000000 c03d6a00 00000000 Call Trace: [<c041a180>] __wake_up+0x2a/0x3d [<ee06a1ea>] log_wait_commit+0x80/0xc7 [jbd] [<c043128b>] autoremove_wake_function+0x0/0x2d [<ee065661>] journal_stop+0x195/0x1ba [jbd] [<c0490a32>] __writeback_single_inode+0x1a3/0x2af [<c04568ea>] do_writepages+0x2b/0x32 [<c045239b>] __filemap_fdatawrite_range+0x66/0x72 [<c04910ce>] sync_inode+0x19/0x24 [<ee09b007>] ext3_sync_file+0xaf/0xc4 [ext3] [<c047426f>] do_fsync+0x41/0x83 [<c04742ce>] __do_fsync+0x1d/0x2b [<c0405413>] syscall_call+0x7/0xb ======================= I have tried disabling irqbalanced as suggested here but it does not seem to make any difference.

Read the article
What does transactions per seconds for a load balancer mean

- by Anurag

I was looking at the product matrix of webmux 592G(load balancer). It says maximum connections per sec = 2.8M Maximum number of transactions = 100,000 What does the above numbers mean. Does above means that load balancer can have 2.8M connections open but only 100K of them will be active per seconds. Also incase any one has used webmux 592G do you guys know in practice how many connections it can have open and what qps it can serve

Read the article
Alternatives for load balancing Tomcat webapp?

- by Mahriman

We wish to enable some form of load balancing for a Tomcat webapp we're currently using today. Unfortunately, I know almost to nothing about Tomcat load balancing, clustering and so on. Can anyone share resources that cover the different alternatives, give some handy pointers (maybe some solutions work better in certain types of environment?) or just some tips on solutions to try out? We're currently running Tomcat 5.5 if that makes any difference in features, however no critical obstacles for upgrading to 6.

Read the article
Servers behind load balancer

- by Tom

We have a CISCO hardware load balancer with two web servers behind it. We'd like to force some URLs to only be served by one of the machines. Firstly, is the job of the load balancer? or would a better approach be create a subdomain such as http://assets.example.com which would be automatically be routed to one of the servers?

Read the article
What happens when a load balancer fails?

- by CowKingDeluxe

Let's say I'm using Amazon's EC2 load balancer. I have it hooked up to two instances (excuse me if my terminology isn't correct). What happens if the load balancer fails? Do both instances fail to work now?

Read the article
Do i need a dedicated server for load balancing?

- by Ben

I'm completely new to the concept of load balancing so i hope this question isn't a "stupid question" because i've been searching around and im having a hard time understanding this. So to my understanding, in order to load balance, i need a separate machine with an ip address i can direct all traffic to. I initially thought i needed to rent 3 dedicated servers, one for load balancing and the other two as backend servers. Would a dedicated server be too much for a load balancer or do hosting companies have special types of computers for that process? Then i read somewhere else that i can install a load balance software in both of the two servers and configure it in a way that doesn't require me to rent another machine/dedicated server for load balancing. So im a bit confuse on how to actually implement a load balancer and whether or not i need a dedicated server for the sole purpose of acting as a load balancing machine. Also, i was recommended to use HAproxy so i'll be heading that direction for load balancing.

Read the article
Big Data – Buzz Words: What is Hadoop – Day 6 of 21

- by Pinal Dave

In yesterday’s blog post we learned what is NoSQL. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – Hadoop. What is Hadoop? Apache Hadoop is an open-source, free and Java based software framework offers a powerful distributed platform to store and manage Big Data. It is licensed under an Apache V2 license. It runs applications on large clusters of commodity hardware and it processes thousands of terabytes of data on thousands of the nodes. Hadoop is inspired from Google’s MapReduce and Google File System (GFS) papers. The major advantage of Hadoop framework is that it provides reliability and high availability. What are the core components of Hadoop? There are two major components of the Hadoop framework and both fo them does two of the important task for it. Hadoop MapReduce is the method to split a larger data problem into smaller chunk and distribute it to many different commodity servers. Each server have their own set of resources and they have processed them locally. Once the commodity server has processed the data they send it back collectively to main server. This is effectively a process where we process large data effectively and efficiently. (We will understand this in tomorrow’s blog post). Hadoop Distributed File System (HDFS) is a virtual file system. There is a big difference between any other file system and Hadoop. When we move a file on HDFS, it is automatically split into many small pieces. These small chunks of the file are replicated and stored on other servers (usually 3) for the fault tolerance or high availability. (We will understand this in the day after tomorrow’s blog post). Besides above two core components Hadoop project also contains following modules as well. Hadoop Common: Common utilities for the other Hadoop modules Hadoop Yarn: A framework for job scheduling and cluster resource management There are a few other projects (like Pig, Hive) related to above Hadoop as well which we will gradually explore in later blog posts. A Multi-node Hadoop Cluster Architecture Now let us quickly see the architecture of the a multi-node Hadoop cluster. A small Hadoop cluster includes a single master node and multiple worker or slave node. As discussed earlier, the entire cluster contains two layers. One of the layer of MapReduce Layer and another is of HDFC Layer. Each of these layer have its own relevant component. The master node consists of a JobTracker, TaskTracker, NameNode and DataNode. A slave or worker node consists of a DataNode and TaskTracker. It is also possible that slave node or worker node is only data or compute node. The matter of the fact that is the key feature of the Hadoop. In this introductory blog post we will stop here while describing the architecture of Hadoop. In a future blog post of this 31 day series we will explore various components of Hadoop Architecture in Detail. Why Use Hadoop? There are many advantages of using Hadoop. Let me quickly list them over here: Robust and Scalable – We can add new nodes as needed as well modify them. Affordable and Cost Effective – We do not need any special hardware for running Hadoop. We can just use commodity server. Adaptive and Flexible – Hadoop is built keeping in mind that it will handle structured and unstructured data. Highly Available and Fault Tolerant – When a node fails, the Hadoop framework automatically fails over to another node. Why Hadoop is named as Hadoop? In year 2005 Hadoop was created by Doug Cutting and Mike Cafarella while working at Yahoo. Doug Cutting named Hadoop after his son’s toy elephant. Tomorrow In tomorrow’s blog post we will discuss Buzz Word – MapReduce. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Master Data Management for Location Data - Oracle Site Hub

- by david.butler(at)oracle.com

Most MDM discussions cover key domains such as customer, supplier, product, service, and reference data. It is usually understood that these domains have complex structures and hundreds if not thousands of attributes that need governing. Location, on the other hand, strikes most people as address data. How hard can that be? But for many industries, locations are complex, and site information is critical to efficient operations and relevant analytics. Retail stores and malls, bank branches, construction sites come to mind. But one of the best industries for illustrating the power of a site mastering application is Oil & Gas. Oracle's Master Data Management solution for location data is the Oracle Site Hub. It is a location mastering solution that enables organizations to centralize site and location specific information from heterogeneous systems, creating a single view of site information that can be leveraged across all functional departments and analytical systems. Let's take a look at the location entities the Oracle Site Hub can manage for the Oil & Gas industry: organizations, property, land, buildings, roads, oilfield, service center, inventory site, real estate, facilities, refineries, storage tanks, vendor locations, businesses, assets; project site, area, well, basin, pipelines, critical infrastructure, offshore platform, compressor station, gas station, etc. Any site can be classified into multiple hierarchies, like organizational hierarchy, operational hierarchy, geographic hierarchy, divisional hierarchies and so on. Any site can also be associated to multiple clusters, i.e. collections of sites, and these can be used as a foundation for driving reporting, analysis, organize daily work, etc. Hierarchies can also be used to model entities which are structured or non-structured collections of nodes, like for example routes, pipelines and more. The User Defined Attribute Framework provides the needed infrastructure to add single row attributes groups like well base attributes (well IDs, well type, well structure and key characterizing measures, and more) and well geometry, and multi row attribute groups like well applications, permits, production data, activities, operations, logs, treatments, tests, drills, treatments, and KPIs. Site Hub can also model areas, lands, fields, basins, pools, platforms, eco-zones, and stratigraphic layers as specific sites, tracking their base attributes, aliases, descriptions, subcomponents and more. Midstream entities (pipelines, logistic sites, pump stations) and downstream entities (cylinders, tanks, inventories, meters, partner's sites, routes, facilities, gas stations, and competitor sites) can also be easily modeled, together with their specific attributes and relationships. Site Hub can store any type of unstructured data associated to a site. This could be stored directly or on an external content management solution, like Oracle Universal Content Management. Considering a well, for example, Site Hub can store any relevant associated multimedia file such as: CAD drawings of the well profile, structure and/or parts, engineering documents, contracts, applications, permits, logs, pictures, photos, videos and more. For any site entity, Site Hub can associate all the related assets and equipments at the site, as well as all relationships between sites, between a site and multiple parties, and between a site and any purchasable or sellable item, over time. Items can be equipment, instruments, facilities, services, products, production entities, production facilities (pipelines, batteries, compressor stations, gas plants, meters, separators, etc.), support facilities (rigs, roads, transmission or radio towers, airstrips, etc.), supplier products and services, catalogs, and more. Items can just be associated to sites using standard Site Hub features, or they can be fully mastered by implementing Oracle Product Hub. Site locations (addresses or geographical coordinates) are also managed with out-of-the-box address geo-coding capabilities coupled with Google Maps integration to deliver powerful mapping capabilities and spatial data analysis. Locations can be shared between different sites. Centered on the site location, any site can also have associated areas. Site Hub can master any site location specific information, like for example cadastral, ownership, jurisdictional, geological, seismic and more, and any site-centric area specific information, like for example economical, political, risk, weather, logistic, traffic information and more. Now if anyone ever asks you why locations need MDM, think about how all these Oil & Gas entities and attributes would translate into your business locations. To learn more about Oracle's full MDM solution for the digital oil field, here is a link to Roberto Negro's outstanding whitepaper: Oracle Site Master Data Management for mastering wells and other PPDM entities in a digital oilfield context

Read the article
Optimal Data Structure for our own API

- by vermiculus

I'm in the early stages of writing an Emacs major mode for the Stack Exchange network; if you use Emacs regularly, this will benefit you in the end. In order to minimize the number of calls made to Stack Exchange's API (capped at 10000 per IP per day) and to just be a generally responsible citizen, I want to cache the information I receive from the network and store it in memory, waiting to be accessed again. I'm really stuck as to what data structure to store this information in. Obviously, it is going to be a list. However, as with any data structure, the choice must be determined by what data is being stored and what how it will be accessed. What, I would like to be able to store all of this information in a single symbol such as stack-api/cache. So, without further ado, stack-api/cache is a list of conses keyed by last update: `(<csite> <csite> <csite>) where <csite> would be (1362501715 . <site>) At this point, all we've done is define a simple association list. Of course, we must go deeper. Each <site> is a list of the API parameter (unique) followed by a list questions: `("codereview" <cquestion> <cquestion> <cquestion>) Each <cquestion> is, you guessed it, a cons of questions with their last update time: `(1362501715 <question>) (1362501720 . <question>) <question> is a cons of a question structure and a list of answers (again, consed with their last update time): `(<question-structure> <canswer> <canswer> <canswer> and ` `(1362501715 . <answer-structure>) This data structure is likely most accurately described as a tree, but I don't know if there's a better way to do this considering the language, Emacs Lisp (which isn't all that different from the Lisp you know and love at all). The explicit conses are likely unnecessary, but it helps my brain wrap around it better. I'm pretty sure a <csite>, for example, would just turn into (<epoch-time> <api-param> <cquestion> <cquestion> ...) Concerns: Does storing data in a potentially huge structure like this have any performance trade-offs for the system? I would like to avoid storing extraneous data, but I've done what I could and I don't think the dataset is that large in the first place (for normal use) since it's all just human-readable text in reasonable proportion. (I'm planning on culling old data using the times at the head of the list; each inherits its last-update time from its children and so-on down the tree. To what extent this cull should take place: I'm not sure.) Does storing data like this have any performance trade-offs for that which must use it? That is, will set and retrieve operations suffer from the size of the list? Do you have any other suggestions as to what a better structure might look like?

Read the article
SQLAuthority News – SQL Server Technical Article – The Data Loading Performance Guide

- by pinaldave

The white paper describes load strategies for achieving high-speed data modifications of a Microsoft SQL Server database. “Bulk Load Methods” and “Other Minimally Logged and Metadata Operations” provide an overview of two key and interrelated concepts for high-speed data loading: bulk loading and metadata operations. After this background knowledge, white paper describe how these methods can be [...]

Read the article
How do you handle the fetchxml result data?

- by Luke Baulch

I have avoided working with fetchxml as I have been unsure the best way to handle the result data after calling crmService.Fetch(fetchXml). In a couple of situations, I have used an XDocument with LINQ to retrieve the data from this data structure, such as: XDocument resultset = XDocument.Parse(_service.Fetch(fetchXml)); if (resultset.Root == null || !resultset.Root.Elements("result").Any()) { return; } foreach (var displayItem in resultset.Root.Elements("result").Select(item => item.Element(displayAttributeName)).Distinct()) { if (displayItem!= null && displayItem.Value != null) { dropDownList.Items.Add(displayItem.Value); } } What is the best way to handle fetchxml result data, so that it can be easily used. Applications such as passing these records into an ASP.NET datagrid would be quite useful.

Read the article

Search Results

Search found 71852 results on 2875 pages for 'data load'.

Page 13/2875 | < Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >

- by PreMagination

- by nicole.bruns(at)oracle.com

- by miku

- by pinaldave

- by Dan McGrath

- by dan.mcclary

- by Ray Toal

- by [email protected]

- by Jinjin.Wang

- by Alasdair Allan

- by David Wolever

- by Mat Banik

- by Adam Matan

- by ChrisRamakers

- by Peter Lindqvist

- by Anurag

- by Mahriman

- by Tom

- by CowKingDeluxe

- by Ben

- by Pinal Dave

- by david.butler(at)oracle.com

- by vermiculus

- by pinaldave

- by Luke Baulch

< Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >