Search Results

Search found 60622 results on 2425 pages for 'linked data'.

Page 13/2425 | < Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >

Data Virtualization: Federated and Hybrid

- by Krishnamoorthy

Data becomes useful when it can be leveraged at the right time. Not only enterprises application stores operate on large volume, velocity and variety of data. Mobile and social computing are in the need of operating in foresaid data. Replicating and transferring large swaths of data is one challenge faced in the field of data integration. However, smaller chunks of data aggregated from a variety of sources presents and even more interesting challenge in the industry. Over the past few decades, technology trends focused on best user experience, operating systems, high performance computing, high performance web sites, analysis of warehouse data, service oriented architecture, social computing, cloud computing, and big data. Operating on the ‘dark data’ becomes mandatory in the future technology trend, although, no solution can make dark data useful data in a single day. Useful data can be quantified by the facts of contextual, personalized and on time delivery. In most cases, data from a single source may not be complete the picture. Data has to be combined and computed from various sources, where data may be captured as hybrid data, meaning the combination of structured and unstructured data. Since related data is often found across disparate sources, effectively integrating these sources determines how useful this data ultimately becomes. Technology trends in 2013 are expected to focus on big data and private cloud. Consumers are not merely interested in where data is located or how data is retrieved and computed. Consumers are interested in how quick and how the data can be leveraged. In many cases, data virtualization is the right solution, and is expected to play a foundational role for SOA, Cloud integration, and Big Data. The Oracle Data Integration portfolio includes a data virtualization product called ODSI (Oracle Data Service Integrator). Unlike other data virtualization solutions, ODSI can perform both read and write operations on federated/hybrid data (RDBMS, Webservices, delimited file and XML). The ODSI Engine is built on XQuery, hence ODSI user can perform computations on data either using XQuery or SQL. Built in data and query caching features, which reduces latency in repetitive calls. Rightly positioning ODSI, can results in a highly scalable model, reducing spend on additional hardware infrastructure.

Read the article
Linked List push()

- by JKid314159

The stack is initialized with a int MaxSize =3. Then I push one int onto the list. " Pushed:" is returned to the console. Program crashes here. I think my logic is flawed but unsure. Maybe an infinite loop or unmet condition? Thanks for your help. I'm trying to traverse the list to the last node in the second part of the full() method. I implemented this stack as array based so must implement this method full() as this method is inside of main class. while(!stacker.full()) { cout << "Enter number = "; cin >> intIn; stacker.push(intIn); cout << "Pushed: " << intIn << endl; }//while Call to LinkListStack.cpp to class LinkList full(). int LinkList::full() { if(head == NULL) { top = 0; } else { LinkNode * tmp1; LinkNode * tmp2; tmp1 = head; while(top != MaxSize) { if(tmp1->next != NULL){ tmp2 = tmp1->next; tmp1 = tmp2; ++top; }//if }//while }//else return (top + 1 == MaxSize); }

Read the article
Python what's the data structure for triple data

- by Paul

I've got a set of data that has three attributes, say A, B, and C, where A is kind of the index (i.e., A is used to look up the other two attributes.) What would be the best data structure for such data? I used two dictionaries, with A as the index of each. However, there's key errors when the query to the data doesn't match any instance of A.

Read the article
about doubly linked list

- by user329820

Hi I want to know that do "head" and "tail" store any value like the other nodes?? thanks

Read the article
C++ Bubble Sorting for Singly Linked List [closed]

- by user1119900

I have implemented a simple word frequency program in C++. Everything but the sorting is OK, but the sorting in the following script does not work. Any emergent help will be great.. #include <stdio.h> #include <string.h> #include <stdlib.h> #include <ctype.h> #include <iostream> #include <fstream> #include <cstdio> using namespace std; #include "ProcessLines.h" struct WordCounter { char *word; int word_count; struct WordCounter *pNext; // pointer to the next word counter in the list }; /* pointer to first word counter in the list */ struct WordCounter *pStart = NULL; /* pointer to a word counter */ struct WordCounter *pCounter = NULL; /* Print statistics and words */ void PrintWords() { ... pCounter = pStart; bubbleSort(pCounter); ... } //end-PrintWords void bubbleSort(struct WordCounter *ptr) { WordCounter *temp = ptr; WordCounter *curr; for (bool didSwap = true; didSwap;) { didSwap = false; for (curr = ptr; curr->pNext != NULL; curr = curr->pNext) { if (curr->word > curr->pNext->word) { temp->word = curr->word; curr->word = curr->pNext->word; curr->pNext->word = temp->word; didSwap = true; } } } }

Read the article
Sorting linked lists in Pascal

- by user3712174

I'm doing my final project for Informatics class and I can't get my sorting procedure to work. Have a look at my program, specifically the bolded part (some things are in Croatian. - if you need something translated, let me know): type pokazivac=^slog; slog=record prezime_ime:string[30]; redni_broj:string[2]; fakultet:string[50]; bodovi:integer; sljedeci:pokazivac; end; var pocetni, trenutni, prethodni:pokazivac; i:integer; procedure racunaj; var i,a,c:integer; b,d,e,f,g,h,j:real; begin write('Postotak bodova (u decimalnom zapisu) koje ucenik ostvaruje na temelju prosjeka ocjena - '); readln(e); e:=e*1000/4; write('Prosjek ocjena u prvom razredu : '); readln(f); f:=f/5*e; write('Prosjek ocjena u drugom razredu : '); readln(g); g:=g/5*e; write('Prosjek ocjena u trecem razredu : '); readln(h); h:=h/5*e; write('Prosjek ocjena u cetvrtom razredu : '); readln(j); j:=j/5*e; d:=f+g+h+j; write('Broj predmeta (ne racunajuci hrvatski jezik, strani jezik i matematiku) koju je ucenik/ca polagao na maturi - '); readln(a); write('Postotak rijesnosti ispita iz hrvatskog jezika te zatim maksimum bodova koje je ucenik/ca mogao ostvariti - '); readln(b); readln(c); d:=d+b*c; write('Postotak rijesnosti ispita iz stranog jezika te zatim maksimum bodova koje je ucenik/ca mogao ostvariti - '); readln(b); readln(c); d:=d+(b*c); write('Postotak rijesnosti ispita iz matematike te zatim maksimum bodova koje je ucenik/ca mogao ostvariti - '); readln(b); readln(c); d:=d+(b*c); for i:=1 to a do begin writeln('Postotak rijesnosti dodatnog predmeta te zatim maksimum bodova koje je ucenik/ca mogao ostvariti - '); readln(b); readln(c); d:=d+(b*c); end; d:=round(d); writeln('Vas broj bodova je: ', d:4:2); write('Za nastavak pritisnite ENTER..'); readln; end; procedure unos; begin new(trenutni); write('Redni broj ucenika - ');readln(trenutni^.redni_broj); write('Prezime i ime - ');readln(trenutni^.prezime_ime); write('Naziv fakultet - ');readln(trenutni^.fakultet); write('Bodovi - ');readln(trenutni^.bodovi); trenutni^.sljedeci:=pocetni; pocetni:=trenutni; end; procedure ispis; begin writeln(); writeln('Lista popisanih ucenika:'); writeln(); trenutni:=pocetni; while trenutni<>NIL do begin with trenutni^do begin writeln('IME: ',prezime_ime); writeln('FAKULTET: ',fakultet); writeln('BODOVI: ',bodovi); writeln(); end; trenutni:=trenutni^.sljedeci; end; writeln(); write('Za nastavak pritisnite ENTER..'); readln; end; procedure brisi; var s:string; begin trenutni:= pocetni; prethodni:=pocetni; write('Redni broj ucenika kojeg zelite izbrisati - '); readln(s); while trenutni<>NIL do begin if trenutni^.redni_broj=s then begin prethodni^.sljedeci:=trenutni^.sljedeci; dispose(trenutni); break; end; trenutni:=trenutni^.sljedeci; end; end; procedure izmjeni; var s:string; begin trenutni:=pocetni; write('Redni broj ucenika cije podatke zelite izmijeniti - '); readln(s); while trenutni<> NIL do begin if trenutni^.redni_broj=s then begin write(trenutni^.prezime_ime, ' - '); readln(trenutni^.prezime_ime); write(trenutni^.fakultet, ' - '); readln(trenutni^.fakultet); write(trenutni^.bodovi, ' - '); readln(trenutni^.bodovi); break; end; trenutni:=trenutni^.sljedeci; end; end; **procedure sortiraj; var t1,t2,t:pokazivac; begin t1:=pocetni; while t1 <> NIL do begin t2:=t1^.sljedeci; while t2<>NIL do if t2^.bodovi<t1^.bodovi then begin new(t); t^.redni_broj:=t1^.redni_broj; t^.prezime_ime:=t1^.prezime_ime; t^.fakultet:=t1^.fakultet; t^.bodovi:=t1^.bodovi; t1^.redni_broj:=t2^.redni_broj; t1^.prezime_ime:=t2^.prezime_ime; t1^.fakultet:=t2^.fakultet; t1^.bodovi:=t2^.bodovi; t2^.redni_broj:=t^.redni_broj; t2^.prezime_ime:=t^.prezime_ime; t2^.fakultet:=t^.fakultet; t2^.bodovi:=t^.bodovi; dispose(t); end; t2:=t2^.sljedeci; end; t1:=t1^.sljedeci; write('Za nastavak pritisnite ENTER..'); readln; end;** begin pocetni:=NIL; trenutni:=NIL; writeln('******************************************'); writeln('**********DOBRODOSLI U FAX-O-MAT**********'); writeln('******************************************'); repeat writeln('1 - Racunaj broj bodova'); writeln('2 - Dodaj ucenika'); writeln('3 - Brisi ucenika'); writeln('4 - Ispis liste'); writeln('5 - Izmjeni podatke'); writeln('6 - Sortiraj listu prema broju bodova'); writeln('0 - Kraj'); readln(i); case i of 1:racunaj; 2:unos; 3:brisi; 4:ispis; 5:izmjeni; 6:sortiraj; end; until i=0; end. Either it crashes with a fatal error, or when I press the number 6, nothing happens. The pointer keeps blinking and I can't enter any more numbers.

Read the article
Implementing a generic repository for WCF data services

- by cibrax

The repository implementation I am going to discuss here is not exactly what someone would call repository in terms of DDD, but it is an abstraction layer that becomes handy at the moment of unit testing the code around this repository. In other words, you can easily create a mock to replace the real repository implementation. The WCF Data Services update for .NET 3.5 introduced a nice feature to support two way data bindings, which is very helpful for developing WPF or Silverlight based application but also for implementing the repository I am going to talk about. As part of this feature, the WCF Data Services Client library introduced a new collection DataServiceCollection<T> that implements INotifyPropertyChanged to notify the data context (DataServiceContext) about any change in the association links. This means that it is not longer necessary to manually set or remove the links in the data context when an item is added or removed from a collection. Before having this new collection, you basically used the following code to add a new item to a collection. Order order = new Order { Name = "Foo" }; OrderItem item = new OrderItem { Name = "bar", UnitPrice = 10, Qty = 1 }; var context = new OrderContext(); context.AddToOrders(order); context.AddToOrderItems(item); context.SetLink(item, "Order", order); context.SaveChanges(); Now, thanks to this new collection, everything is much simpler and similar to what you have in other ORMs like Entity Framework or L2S. Order order = new Order { Name = "Foo" }; OrderItem item = new OrderItem { Name = "bar", UnitPrice = 10, Qty = 1 }; order.Items.Add(item); var context = new OrderContext(); context.AddToOrders(order); context.SaveChanges(); In order to use this new feature, you first need to enable V2 in the data service, and then use some specific arguments in the datasvcutil tool (You can find more information about this new feature and how to use it in this post). DataSvcUtil /uri:"http://localhost:3655/MyDataService.svc/" /out:Reference.cs /dataservicecollection /version:2.0 Once you use those two arguments, the generated proxy classes will use DataServiceCollection<T> rather than a simple ObjectCollection<T>, which was the default collection in V1. There are some aspects that you need to know to use this feature correctly. 1. All the entities retrieved directly from the data context with a query track the changes and report those to the data context automatically. 2. A entity created with “new” does not track any change in the properties or associations. In order to enable change tracking in this entity, you need to do the following trick. public Order CreateOrder() { var collection = new DataServiceCollection<Order>(this.context); var order = new Order(); collection.Add(order); return order; } You basically need to create a collection, and add the entity to that collection with the “Add” method to enable change tracking on that entity. 3. If you need to attach an existing entity (For example, if you created the entity with the “new” operator rather than retrieving it from the data context with a query) to a data context for tracking changes, you can use the “Load” method in the DataServiceCollection. var order = new Order { Id = 1 }; var collection = new DataServiceCollection<Order>(this.context); collection.Load(order); In this case, the order with Id = 1 must exist on the data source exposed by the Data service. Otherwise, you will get an error because the entity did not exist. These cool extensions methods discussed by Stuart Leeks in this post to replace all the magic strings in the “Expand” operation with Expression Trees represent another feature I am going to use to implement this generic repository. Thanks to these extension methods, you could replace the following query with magic strings by a piece of code that only uses expressions. Magic strings, var customers = dataContext.Customers .Expand("Orders") .Expand("Orders/Items") Expressions, var customers = dataContext.Customers .Expand(c => c.Orders.SubExpand(o => o.Items)) That query basically returns all the customers with their orders and order items. Ok, now that we have the automatic change tracking support and the expression support for explicitly loading entity associations, we are ready to create the repository. The interface for this repository looks like this,public interface IRepository { T Create<T>() where T : new(); void Update<T>(T entity); void Delete<T>(T entity); IQueryable<T> RetrieveAll<T>(params Expression<Func<T, object>>[] eagerProperties); IQueryable<T> Retrieve<T>(Expression<Func<T, bool>> predicate, params Expression<Func<T, object>>[] eagerProperties); void Attach<T>(T entity); void SaveChanges(); } The Retrieve and RetrieveAll methods are used to execute queries against the data service context. While both methods receive an array of expressions to load associations explicitly, only the Retrieve method receives a predicate representing the “where” clause. The following code represents the final implementation of this repository.public class DataServiceRepository: IRepository { ResourceRepositoryContext context; public DataServiceRepository() : this (new DataServiceContext()) { } public DataServiceRepository(DataServiceContext context) { this.context = context; } private static string ResolveEntitySet(Type type) { var entitySetAttribute = (EntitySetAttribute)type.GetCustomAttributes(typeof(EntitySetAttribute), true).FirstOrDefault(); if (entitySetAttribute != null) return entitySetAttribute.EntitySet; return null; } public T Create<T>() where T : new() { var collection = new DataServiceCollection<T>(this.context); var entity = new T(); collection.Add(entity); return entity; } public void Update<T>(T entity) { this.context.UpdateObject(entity); } public void Delete<T>(T entity) { this.context.DeleteObject(entity); } public void Attach<T>(T entity) { var collection = new DataServiceCollection<T>(this.context); collection.Load(entity); } public IQueryable<T> Retrieve<T>(Expression<Func<T, bool>> predicate, params Expression<Func<T, object>>[] eagerProperties) { var entitySet = ResolveEntitySet(typeof(T)); var query = context.CreateQuery<T>(entitySet); foreach (var e in eagerProperties) { query = query.Expand(e); } return query.Where(predicate); } public IQueryable<T> RetrieveAll<T>(params Expression<Func<T, object>>[] eagerProperties) { var entitySet = ResolveEntitySet(typeof(T)); var query = context.CreateQuery<T>(entitySet); foreach (var e in eagerProperties) { query = query.Expand(e); } return query; } public void SaveChanges() { this.context.SaveChanges(SaveChangesOptions.Batch); } } For instance, you can use the following code to retrieve customers with First name equal to “John”, and all their orders in a single call. repository.Retrieve<Customer>( c => c.FirstName == “John”, //Where c => c.Orders.SubExpand(o => o.Items)); In case, you want to have some pre-defined queries that you are going to use across several places, you can put them in an specific class. public static class CustomerQueries { public static Expression<Func<Customer, bool>> LastNameEqualsTo(string lastName) { return c => c.LastName == lastName; } } And then, use it with the repository. repository.Retrieve<Customer>( CustomerQueries.LastNameEqualsTo("foo"), c => c.Orders.SubExpand(o => o.Items));

Read the article
data structure for counting frequencies in a database table-like format

- by user373312

i was wondering if there is a data structure optimized to count frequencies against data that is stored in a database table-like format. for example, the data comes in a (comma) delimited format below. col1, col2, col3 x, a, green x, b, blue ... y, c, green now i simply want to count the frequency of col1=x or col1=x and col2=green. i have been storing the data in a database table, but in my profiling and from empirical observation, database connection is the bottle-neck. i have tried using in-memory database solutions too, and that works quite well; the only problem is memory requirements and quirky init/destroy calls. also, i work mainly with java, but have experience with .net, and was wondering if there was any api to work with "tabular" data in a linq way using java. any help is appreciated.

Read the article
Oracle Big Data Software Downloads

- by Mike.Hallett(at)Oracle-BI&EPM

Companies have been making business decisions for decades based on transactional data stored in relational databases. Beyond that critical data, is a potential treasure trove of less structured data: weblogs, social media, email, sensors, and photographs that can be mined for useful information. Oracle offers a broad integrated portfolio of products to help you acquire and organize these diverse data sources and analyze them alongside your existing data to find new insights and capitalize on hidden relationships. Oracle Big Data Connectors Downloads here, includes: Oracle SQL Connector for Hadoop Distributed File System Release 2.1.0 Oracle Loader for Hadoop Release 2.1.0 Oracle Data Integrator Companion 11g Oracle R Connector for Hadoop v 2.1 Oracle Big Data Documentation The Oracle Big Data solution offers an integrated portfolio of products to help you organize and analyze your diverse data sources alongside your existing data to find new insights and capitalize on hidden relationships. Oracle Big Data, Release 2.2.0 - E41604_01 zip (27.4 MB) Integrated Software and Big Data Connectors User's Guide HTML PDF Oracle Data Integrator (ODI) Application Adapter for Hadoop Apache Hadoop is designed to handle and process data that is typically from data sources that are non-relational and data volumes that are beyond what is handled by relational databases. Typical processing in Hadoop includes data validation and transformations that are programmed as MapReduce jobs. Designing and implementing a MapReduce job usually requires expert programming knowledge. However, when you use Oracle Data Integrator with the Application Adapter for Hadoop, you do not need to write MapReduce jobs. Oracle Data Integrator uses Hive and the Hive Query Language (HiveQL), a SQL-like language for implementing MapReduce jobs. Employing familiar and easy-to-use tools and pre-configured knowledge modules (KMs), the application adapter provides the following capabilities: Loading data into Hadoop from the local file system and HDFS Performing validation and transformation of data within Hadoop Loading processed data from Hadoop to an Oracle database for further processing and generating reports Oracle Database Loader for Hadoop Oracle Loader for Hadoop is an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracle database. It pre-partitions the data if necessary and transforms it into a database-ready format. Oracle Loader for Hadoop is a Java MapReduce application that balances the data across reducers to help maximize performance. Oracle R Connector for Hadoop Oracle R Connector for Hadoop is a collection of R packages that provide: Interfaces to work with Hive tables, the Apache Hadoop compute infrastructure, the local R environment, and Oracle database tables Predictive analytic techniques, written in R or Java as Hadoop MapReduce jobs, that can be applied to data in HDFS files You install and load this package as you would any other R package. Using simple R functions, you can perform tasks such as: Access and transform HDFS data using a Hive-enabled transparency layer Use the R language for writing mappers and reducers Copy data between R memory, the local file system, HDFS, Hive, and Oracle databases Schedule R programs to execute as Hadoop MapReduce jobs and return the results to any of those locations Oracle SQL Connector for Hadoop Distributed File System Using Oracle SQL Connector for HDFS, you can use an Oracle Database to access and analyze data residing in Hadoop in these formats: Data Pump files in HDFS Delimited text files in HDFS Hive tables For other file formats, such as JSON files, you can stage the input in Hive tables before using Oracle SQL Connector for HDFS. Oracle SQL Connector for HDFS uses external tables to provide Oracle Database with read access to Hive tables, and to delimited text files and Data Pump files in HDFS. Related Documentation Cloudera's Distribution Including Apache Hadoop Library HTML Oracle R Enterprise HTML Oracle NoSQL Database HTML Recent Blog Posts Big Data Appliance vs. DIY Price Comparison Big Data: Architecture Overview Big Data: Achieve the Impossible in Real-Time Big Data: Vertical Behavioral Analytics Big Data: In-Memory MapReduce Flume and Hive for Log Analytics Building Workflows in Oozie

Read the article
The Ins and Outs of Effective Smart Grid Data Management

- by caroline.yu

Oracle Utilities and Accenture recently sponsored a one-hour Web cast entitled, "The Ins and Outs of Effective Smart Grid Data Management." Oracle and Accenture created this Web cast to help utilities better understand the types of data collected over smart grid networks and the issues associated with mapping out a coherent information management strategy. The Web cast also addressed important points that utilities must consider with the imminent flood of data that both present and next-generation smart grid components will generate. The three speakers, including Oracle Utilities' Brad Williams, focused on the key factors associated with taking the millions of data points captured in real time and implementing the strategies, frameworks and technologies that enable utilities to process, store, analyze, visualize, integrate, transport and transform data into the information required to deliver targeted business benefits. The Web cast replay is available here. The Web cast slides are available here.

Read the article
What Works in Data Integration?

- by dain.hansen

TDWI just recently put out this paper on "What Works in Data Integration". I invite you especially to take a look at the section on "Accelerating your Business with Real-time Data Integration" and the DIRECTV case study. The article discusses some of the technology considerations for BI/DW and how data integration plays a role to deliver timely, accessible, and high-quality data. It goes on to outline the three key requirements for how to deliver high performance, low impact, and reliability and how that can translate to faster results. The DIRECTV webinar is something you definitely want to take a look at, you'll hear how DIRECTV successfully transformed their data warehouse investments into a competitive advantage with Oracle GoldenGate.

Read the article
How to Achieve Real-Time Data Protection and Availabilty....For Real

- by JoeMeeks

There is a class of business and mission critical applications where downtime or data loss have substantial negative impact on revenue, customer service, reputation, cost, etc. Because the Oracle Database is used extensively to provide reliable performance and availability for this class of application, it also provides an integrated set of capabilities for real-time data protection and availability. Active Data Guard, depicted in the figure below, is the cornerstone for accomplishing these objectives because it provides the absolute best real-time data protection and availability for the Oracle Database. This is a bold statement, but it is supported by the facts. It isn’t so much that alternative solutions are bad, it’s just that their architectures prevent them from achieving the same levels of data protection, availability, simplicity, and asset utilization provided by Active Data Guard. Let’s explore further. Backups are the most popular method used to protect data and are an essential best practice for every database. Not surprisingly, Oracle Recovery Manager (RMAN) is one of the most commonly used features of the Oracle Database. But comparing Active Data Guard to backups is like comparing apples to motorcycles. Active Data Guard uses a hot (open read-only), synchronized copy of the production database to provide real-time data protection and HA. In contrast, a restore from backup takes time and often has many moving parts - people, processes, software and systems – that can create a level of uncertainty during an outage that critical applications can’t afford. This is why backups play a secondary role for your most critical databases by complementing real-time solutions that can provide both data protection and availability. Before Data Guard, enterprises used storage remote-mirroring for real-time data protection and availability. Remote-mirroring is a sophisticated storage technology promoted as a generic infrastructure solution that makes a simple promise – whatever is written to a primary volume will also be written to the mirrored volume at a remote site. Keeping this promise is also what causes data loss and downtime when the data written to primary volumes is corrupt – the same corruption is faithfully mirrored to the remote volume making both copies unusable. This happens because remote-mirroring is a generic process. It has no intrinsic knowledge of Oracle data structures to enable advanced protection, nor can it perform independent Oracle validation BEFORE changes are applied to the remote copy. There is also nothing to prevent human error (e.g. a storage admin accidentally deleting critical files) from also impacting the remote mirrored copy. Remote-mirroring tricks users by creating a false impression that there are two separate copies of the Oracle Database. In truth; while remote-mirroring maintains two copies of the data on different volumes, both are part of a single closely coupled system. Not only will remote-mirroring propagate corruptions and administrative errors, but the changes applied to the mirrored volume are a result of the same Oracle code path that applied the change to the source volume. There is no isolation, either from a storage mirroring perspective or from an Oracle software perspective. Bottom line, storage remote-mirroring lacks both the smarts and isolation level necessary to provide true data protection. Active Data Guard offers much more than storage remote-mirroring when your objective is protecting your enterprise from downtime and data loss. Like remote-mirroring, an Active Data Guard replica is an exact block for block copy of the primary. Unlike remote-mirroring, an Active Data Guard replica is NOT a tightly coupled copy of the source volumes - it is a completely independent Oracle Database. Active Data Guard’s inherent knowledge of Oracle data block and redo structures enables a separate Oracle Database using a different Oracle code path than the primary to use the full complement of Oracle data validation methods before changes are applied to the synchronized copy. These include: physical check sum, logical intra-block checking, lost write validation, and automatic block repair. The figure below illustrates the stark difference between the knowledge that remote-mirroring can discern from an Oracle data block and what Active Data Guard can discern. An Active Data Guard standby also provides a range of additional services enabled by the fact that it is a running Oracle Database - not just a mirrored copy of data files. An Active Data Guard standby database can be open read-only while it is synchronizing with the primary. This enables read-only workloads to be offloaded from the primary system and run on the active standby - boosting performance by utilizing all assets. An Active Data Guard standby can also be used to implement many types of system and database maintenance in rolling fashion. Maintenance and upgrades are first implemented on the standby while production runs unaffected at the primary. After the primary and standby are synchronized and all changes have been validated, the production workload is quickly switched to the standby. The only downtime is the time required for user connections to transfer from one system to the next. These capabilities further expand the expectations of availability offered by a data protection solution beyond what is possible to do using storage remote-mirroring. So don’t be fooled by appearances. Storage remote-mirroring and Active Data Guard replication may look similar on the surface - but the devil is in the details. Only Active Data Guard has the smarts, the isolation, and the simplicity, to provide the best data protection and availability for the Oracle Database. Stay tuned for future blog posts that dive into the many differences between storage remote-mirroring and Active Data Guard along the dimensions of data protection, data availability, cost, asset utilization and return on investment. For additional information on Active Data Guard, see: Active Data Guard Technical White Paper Active Data Guard vs Storage Remote-Mirroring Active Data Guard Home Page on the Oracle Technology Network

Read the article
Are there sources of email marketing data available?

- by Gortron

Are sources of email marketing data available to the public? I would like to see email marketing data to see what kind of content a business sends out, the frequency of sending, the number of people emailed, especially the resulting open rates and click through rates. Are businesses willing to share data on their previous email marketing campaigns without divulging their contact list? I would like to use this data to create an application to help businesses create better newsletters by using this data as a benchmark, basically sharing what works and what doesn't for each industry.

Read the article
Extending SSIS with custom Data Flow components (Presentation)

Download the slides and sample code from my Extending SSIS with custom Data Flow components presentation, first presented at the SQLBits II (The SQL) Community Conference. Abstract Get some real-world insights into developing data flow components for SSIS. This starts with an introduction to the data flow pipeline engine, and explains the real differences between adapters and the three sub-types of transformation. Understanding how the different types of component behave and manage data is key to writing components of your own, and probably should but be required knowledge for anyone building packages at all. Using sample code throughout, I will show you how to write components, as well as highlighting best practice and lessons learned. The sample code includes fully working example projects for source, destination and transformation components. Presentation & Samples (358KB) Extending SSIS with custom Data Flow components.zip

Read the article
How to use OO for data analysis? [closed]

- by Konsta

In which ways could object-orientation (OO) make my data analysis more efficient and let me reuse more of my code? The data analysis can be broken up into get data (from db or csv or similar) transform data (filter, group/pivot, ...) display/plot (graph timeseries, create tables, etc.) I mostly use Python and its Pandas and Matplotlib packages for this besides some DB connectivity (SQL). Almost all of my code is a functional/procedural mix. While I have started to create a data object for a certain collection of time series, I wonder if there are OO design patterns/approaches for other parts of the process that might increase efficiency?

Read the article
Markup format or script for data files?

- by Aaron

The game I'm designing will be mainly written in a high level scripting language (leaning towards either Lua or Squirrel) with a C++ core. In addition to scripts I'm also going to need different data files. Many data files will be for static information such as graphical assets and monster types. I'd also want to create and update data files at runtime for user information like option settings and game saves. Can I get away with using plain script files (i.e. .lua or .nut files) for my data files, or is it better to use dedicated markup formats like XML or YAML? If I use script files, loaded separately from my true scripts, then I wouldn't need an extra library to read those files. Scripting languages like Lua also have table syntax that lend themselves towards data definition. On the other hand I'd have to write my own schema check code. These languages also don't seem to support serialization "out of the box" like the markup format libraries do.

Read the article
Winner of the 2012 Government Big Data Solutions Award

- by Jean-Pierre Dijcks

Hot off the press: The winner of the 2012 Government Big Data Solutions Aware is the National Cancer Institute!! Read all the details on CTOLabs.com. A short excerpt to wet your appetite: "... This solution, based on the Oracle Big Data Appliance with the Cloudera Distribution of Apache Hadoop (CDH), leverages capabilities available from the Big Data community today in pioneering ways that can serve a broad range of researchers. The promising approach of this solution is repeatable across many other Big Data challenges for bioinfomatics, making this approach worthy of its selection as the 2012 Government Big Data Solution Award." Read the entire post. Congrats to the entire team!!

Read the article
SQL Server and the XML Data Type : Data Manipulation

The introduction of the xml data type, with its own set of methods for processing xml data, made it possible for SQL Server developers to create columns and variables of the type xml. Deanna Dicken examines the modify() method, which provides for data manipulation of the XML data stored in the xml data type via XML DML statements. Too many SQL Servers to keep up with?Download a free trial of SQL Response to monitor your SQL Servers in just one intuitive interface."The monitoringin SQL Response is excellent." Mike Towery.

Read the article
Getting data from a webpage in a stable and efficient way

- by Mike Heremans

Recently I've learned that using a regex to parse the HTML of a website to get the data you need isn't the best course of action. So my question is simple: What then, is the best / most efficient and a generally stable way to get this data? I should note that: There are no API's There is no other source where I can get the data from (no databases, feeds and such) There is no access to the source files. (Data from public websites) Let's say the data is normal text, displayed in a table in a html page I'm currently using python for my project but a language independent solution/tips would be nice. As a side question: How would you go about it when the webpage is constructed by Ajax calls?

Read the article
Ubuntu and Windows 8 shared partition gets corrupted

- by Bruno-P

I have a dual boot (Ubuntu 12.04 and Windows 8) system. Both systems have access to an NTFS "DATA" partition which contains all my images, documents, music and some application data like Chrome and Thunderbird Profiles which used by both OS. Everything was working fine in my Dual boot Ubuntu/Windows 7, but after updating to Windows 8 I am having a lot of troubles. First, sometimes, I add some files from Ubuntu into my DATA partition but they don't show up in Windows. Sometimes, I can't even use the DATA partition from Windows. When I try to save a file it gives an error "The directory or file is corrupted or unreadable". I need to run checkdisk to fix it but after some time, same error appears. Before upgrading to Windows 8 I also installed a new hard drive and copied the old data using clonezilla (full disk clone). Here is the log of my last chkdisk: Chkdsk was executed in read/write mode. Checking file system on D: Volume dismounted. All opened handles to this volume are now invalid. Volume label is DATA. CHKDSK is verifying files (stage 1 of 3)... Deleted corrupt attribute list entry with type code 128 in file 67963. Unable to find child frs 0x12a3f with sequence number 0x15. The attribute of type 0x80 and instance tag 0x2 in file 0x1097b has allocated length of 0x560000 instead of 0x427000. Deleted corrupt attribute list entry with type code 128 in file 67963. Unable to locate attribute with instance tag 0x2 and segment reference 0x1e00000001097b. The expected attribute type is 0x80. Deleting corrupt attribute record (128, "") from file record segment 67963. Attribute record of type 0x80 and instance tag 0x3 is cross linked starting at 0x2431b2 for possibly 0x20 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x3 in file 0x1791e is already in use. Deleting corrupt attribute record (128, "") from file record segment 96542. Attribute record of type 0x80 and instance tag 0x4 is cross linked starting at 0x6bc7 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x4 in file 0x17e83 is already in use. Deleting corrupt attribute record (128, "") from file record segment 97923. Attribute record of type 0x80 and instance tag 0x4 is cross linked starting at 0x1f7cec for possibly 0x5 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x4 in file 0x17eaf is already in use. Deleting corrupt attribute record (128, "") from file record segment 97967. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x441bd7f for possibly 0x9 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x32085 is already in use. Deleting corrupt attribute record (128, "") from file record segment 204933. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4457850 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x320be is already in use. Deleting corrupt attribute record (128, "") from file record segment 204990. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4859249 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x3726b is already in use. Deleting corrupt attribute record (128, "") from file record segment 225899. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x485d309 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x3726c is already in use. Deleting corrupt attribute record (128, "") from file record segment 225900. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x48a47de for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37286 is already in use. Deleting corrupt attribute record (128, "") from file record segment 225926. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x48ac80b for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37287 is already in use. Deleting corrupt attribute record (128, "") from file record segment 225927. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x48ae7ef for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37288 is already in use. Deleting corrupt attribute record (128, "") from file record segment 225928. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x48af7f8 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x3728a is already in use. Deleting corrupt attribute record (128, "") from file record segment 225930. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x48c39b6 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37292 is already in use. Deleting corrupt attribute record (128, "") from file record segment 225938. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x495d37a for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x372d7 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226007. Attribute record of type 0xa0 and instance tag 0x5 is cross linked starting at 0x4d0bd38 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0xa0 and instance tag 0x5 in file 0x372dc is already in use. Deleting corrupt attribute record (160, $I30) from file record segment 226012. Attribute record of type 0xa0 and instance tag 0x5 is cross linked starting at 0x4c2d9bc for possibly 0x1 clusters. Some clusters occupied by attribute of type 0xa0 and instance tag 0x5 in file 0x372ed is already in use. Deleting corrupt attribute record (160, $I30) from file record segment 226029. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4a4c1c3 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37354 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226132. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4a8e639 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37376 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226166. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4a8f6eb for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37379 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226169. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4ae1aa8 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37391 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226193. Attribute record of type 0xa0 and instance tag 0x5 is cross linked starting at 0x4b00d45 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0xa0 and instance tag 0x5 in file 0x37396 is already in use. Deleting corrupt attribute record (160, $I30) from file record segment 226198. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4b02d50 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x3739c is already in use. Deleting corrupt attribute record (128, "") from file record segment 226204. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4b3407a for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x373a8 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226216. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4bd8a1b for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x373db is already in use. Deleting corrupt attribute record (128, "") from file record segment 226267. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4bd9a28 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x373dd is already in use. Deleting corrupt attribute record (128, "") from file record segment 226269. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4c2fb24 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x373f3 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226291. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4cb67e9 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37424 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226340. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4cba829 for possibly 0x2 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37425 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226341. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4cbe868 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37427 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226343. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4cbf878 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37428 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226344. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4cc58d8 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x3742a is already in use. Deleting corrupt attribute record (128, "") from file record segment 226346. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4ccc943 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x3742b is already in use. Deleting corrupt attribute record (128, "") from file record segment 226347. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4cd199b for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x3742d is already in use. Deleting corrupt attribute record (128, "") from file record segment 226349. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4cd29a8 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x3742f is already in use. Deleting corrupt attribute record (128, "") from file record segment 226351. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4cd39b8 for possibly 0x2 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37430 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226352. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4cd49c8 for possibly 0x2 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37432 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226354. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4cd9a16 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37435 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226357. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4cdca46 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37436 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226358. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4ce0a78 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37437 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226359. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4ce6ad9 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x3743a is already in use. Deleting corrupt attribute record (128, "") from file record segment 226362. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4cebb28 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x3743b is already in use. Deleting corrupt attribute record (128, "") from file record segment 226363. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4ceeb67 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x3743d is already in use. Deleting corrupt attribute record (128, "") from file record segment 226365. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4cf4bc6 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x3743e is already in use. Deleting corrupt attribute record (128, "") from file record segment 226366. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4cfbc3a for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37440 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226368. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4cfcc48 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37442 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226370. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4d02ca9 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37443 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226371. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4d06ce8 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37444 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226372. Attribute record of type 0xa0 and instance tag 0x5 is cross linked starting at 0x4d9a608 for possibly 0x2 clusters. Some clusters occupied by attribute of type 0xa0 and instance tag 0x5 in file 0x37449 is already in use. Deleting corrupt attribute record (160, $I30) from file record segment 226377. Attribute record of type 0xa0 and instance tag 0x5 is cross linked starting at 0x4d844ab for possibly 0x1 clusters. Some clusters occupied by attribute of type 0xa0 and instance tag 0x5 in file 0x3744b is already in use. Deleting corrupt attribute record (160, $I30) from file record segment 226379. Attribute record of type 0xa0 and instance tag 0x5 is cross linked starting at 0x4d6c32b for possibly 0x1 clusters. Some clusters occupied by attribute of type 0xa0 and instance tag 0x5 in file 0x3744c is already in use. Deleting corrupt attribute record (160, $I30) from file record segment 226380. Attribute record of type 0xa0 and instance tag 0x5 is cross linked starting at 0x4d2af25 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0xa0 and instance tag 0x5 in file 0x3744e is already in use. Deleting corrupt attribute record (160, $I30) from file record segment 226382. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4d0fd78 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x37451 is already in use. Deleting corrupt attribute record (128, "") from file record segment 226385. Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x4d16ef8 for possibly 0x1 clusters. Some clusters occupied by attribute of type 0x8 Can anyone help? Thank you

Read the article
What data structure to use / data persistence

- by Dave

I have an app where I need one table of information with the following fields: field 1 - int or char field 2 - string (max 10 char) field 3 - string (max 20 char) field 4 - float I need the program to filter on field 1 based upon a segmented control and select a field 2 from a picker. From this data I need to look up field 4 to use in a calculation. Total records will be about 200. I never see it go above 400 - 500. I am going to use a singleton which I am able to do, I just need help with the structure for this with data persistence. What type of data structure should I use for this and should I use NSNumber, NSString, etc. or old data types like float, Char, etc. I thought about a struct put into an array but there is probably a better way. This is new to me so any help or reference to examples would be great. I also thought about a plist or dictionary but it looks like it is just a lookup and a field which obviously won't work. Core data looked like overkill to me. Also, with any recommendation how should I get initial data into it? I want the user to be able to edit and add to the database. Sorry for the old terms, you can see what generation I am from... Thanks in advance!!!!

Read the article
Using a "white list" for extracting terms for Text Mining

- by [email protected]

In Part 1 of my post on "Generating cluster names from a document clustering model" (part 1, part 2, part 3), I showed how to build a clustering model from text documents using Oracle Data Miner, which automates preparing data for text mining. In this process we specified a custom stoplist and lexer and relied on Oracle Text to identify important terms. However, there is an alternative approach, the white list, which uses a thesaurus object with the Oracle Text CTXRULE index to allow you to specify the important terms. INTRODUCTIONA stoplist is used to exclude, i.e., black list, specific words in your documents from being indexed. For example, words like a, if, and, or, and but normally add no value when text mining. Other words can also be excluded if they do not help to differentiate documents, e.g., the word Oracle is ubiquitous in the Oracle product literature. One problem with stoplists is determining which words to specify. This usually requires inspecting the terms that are extracted, manually identifying which ones you don't want, and then re-indexing the documents to determine if you missed any. Since a corpus of documents could contain thousands of words, this could be a tedious exercise. Moreover, since every word is considered as an individual token, a term excluded in one context may be needed to help identify a term in another context. For example, in our Oracle product literature example, the words "Oracle Data Mining" taken individually are not particular helpful. The term "Oracle" may be found in nearly all documents, as with the term "Data." The term "Mining" is more unique, but could also refer to the Mining industry. If we exclude "Oracle" and "Data" by specifying them in the stoplist, we lose valuable information. But it we include them, they may introduce too much noise. Still, when you have a broad vocabulary or don't have a list of specific terms of interest, you rely on the text engine to identify important terms, often by computing the term frequency - inverse document frequency metric. (This is effectively a weight associated with each term indicating its relative importance in a document within a collection of documents. We'll revisit this later.) The results using this technique is often quite valuable. As noted above, an alternative to the subtractive nature of the stoplist is to specify a white list, or a list of terms--perhaps multi-word--that we want to extract and use for data mining. The obvious downside to this approach is the need to specify the set of terms of interest. However, this may not be as daunting a task as it seems. For example, in a given domain (Oracle product literature), there is often a recognized glossary, or a list of keywords and phrases (Oracle product names, industry names, product categories, etc.). Being able to identify multi-word terms, e.g., "Oracle Data Mining" or "Customer Relationship Management" as a single token can greatly increase the quality of the data mining results. The remainder of this post and subsequent posts will focus on how to produce a dataset that contains white list terms, suitable for mining. CREATING A WHITE LIST We'll leverage the thesaurus capability of Oracle Text. Using a thesaurus, we create a set of rules that are in effect our mapping from single and multi-word terms to the tokens used to represent those terms. For example, "Oracle Data Mining" becomes "ORACLEDATAMINING." First, we'll create and populate a mapping table called my_term_token_map. All text has been converted to upper case and values in the TERM column are intended to be mapped to the token in the TOKEN column. TERM TOKEN DATA MINING DATAMINING ORACLE DATA MINING ORACLEDATAMINING 11G ORACLE11G JAVA JAVA CRM CRM CUSTOMER RELATIONSHIP MANAGEMENT CRM ... Next, we'll create a thesaurus object my_thesaurus and a rules table my_thesaurus_rules: CTX_THES.CREATE_THESAURUS('my_thesaurus', FALSE); CREATE TABLE my_thesaurus_rules (main_term VARCHAR2(100), query_string VARCHAR2(400)); We next populate the thesaurus object and rules table using the term token map. A cursor is defined over my_term_token_map. As we iterate over the rows, we insert a synonym relationship 'SYN' into the thesaurus. We also insert into the table my_thesaurus_rules the main term, and the corresponding query string, which specifies synonyms for the token in the thesaurus. DECLARE cursor c2 is select token, term from my_term_token_map; BEGIN for r_c2 in c2 loop CTX_THES.CREATE_RELATION('my_thesaurus',r_c2.token,'SYN',r_c2.term); EXECUTE IMMEDIATE 'insert into my_thesaurus_rules values (:1,''SYN(' || r_c2.token || ', my_thesaurus)'')' using r_c2.token; end loop; END; We are effectively inserting the token to return and the corresponding query that will look up synonyms in our thesaurus into the my_thesaurus_rules table, for example: 'ORACLEDATAMINING' SYN ('ORACLEDATAMINING', my_thesaurus)At this point, we create a CTXRULE index on the my_thesaurus_rules table: create index my_thesaurus_rules_idx on my_thesaurus_rules(query_string) indextype is ctxsys.ctxrule; In my next post, this index will be used to extract the tokens that match each of the rules specified. We'll then compute the tf-idf weights for each of the terms and create a nested table suitable for mining.

Read the article
ADO.NET (WCF) Data Services Query Interceptor Hangs IIS

- by PreMagination

I have an ADO.NET Data Service that's supposed to provide read-only access to a somewhat complex database. Logically I have table-per-type (TPT) inheritance in my data model but the EDM doesn't implement inheritance. (Limitation of EF and navigation properties on derived types. STILL not fixed in EF4!) I can query my EDM directly (using a separate project) using a copy of the query I'm trying to run against the web service, results are returned within 10 seconds. Disabling the query interceptors I'm able to make the same query against the web service, results are returned similarly quickly. I can enable some of the query interceptors and the results are returned slowly, up to a minute or so later. Alternatively, I can enable all the query interceptors, expand less of the properties on the main object I'm querying, and results are returned in a similar period of time. (I've increased some of the timeout periods) Up til this point Sql Profiler indicates the slow-down is the database. (That's a post for a different day) But when I enable all my query interceptors and expand all the properties I'd like to have the IIS worker process pegs the CPU for 20 minutes and a query is never even made against the database. This implies to me that yes, my implementation probably sucks but regardless the Data Services "tier" is having an issue it shouldn't. WCF tracing didn't reveal anything interesting to my untrained eye. Details: Data model: Agent-Person-Student Student has a collection of referrals Students and referrals are private, queries against the web service should only return "your" students and referrals. This means Person and Agent need to be filtered too. Other entities (Agent-Organization-School) can be accessed by anyone who has authenticated. The existing security model is poorly suited to perform this type of filtering for this type of data access, the query interceptors are complicated and cause EF to generate some entertaining sql queries. Sample Interceptor [QueryInterceptor("Agents")] public Expression<Func<Agent, Boolean>> OnQueryAgents() { //Agent is a Person(1), Educator(2), Student(3), or Other Person(13); allow if scope permissions exist return ag => (ag.AgentType.AgentTypeId == 1 || ag.AgentType.AgentTypeId == 2 || ag.AgentType.AgentTypeId == 3 || ag.AgentType.AgentTypeId == 13) && ag.Person.OrganizationPersons.Count<OrganizationPerson>(op => op.Organization.ScopePermissions.Any<ScopePermission> (p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124) || op.Organization.HierarchyDescendents.Any<OrganizationsHierarchy>(oh => oh.AncestorOrganization.ScopePermissions.Any<ScopePermission> (p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124))) > 0; } The query interceptors for Person, Student, Referral are all very similar, ie they traverse multiple same/similar tables to look for ScopePermissions as above. Sample Query var referrals = (from r in service.Referrals .Expand("Organization/ParentOrganization") .Expand("Educator/Person/Agent") .Expand("Student/Person/Agent") .Expand("Student") .Expand("Grade") .Expand("ProblemBehavior") .Expand("Location") .Expand("Motivation") .Expand("AdminDecision") .Expand("OthersInvolved") where r.DateCreated >= coupledays && r.DateDeleted == null select r); Any suggestions or tips would be greatly associated, for fixing my current implementation or in developing a new one, with the caveat that the database can't be changed and that ultimately I need to expose a large portion of the database via a web service that limits data access to the data authorized for, for the purpose of data integration with multiple outside parties. THANK YOU!!!

Read the article
make-like build tools for data?

- by miku

Make is a standard tools for building software. But make decides whether a target needs to be regenerated by comparing file modification times. Are there any proven, preferably small tools that handle builds not for software but for data? Something that regenerates targets not only on mod times but on certain other properties (e.g. completeness). (Or alternatively some paper that describes such a tool.) As illustration: I'd like to automate the following process: get data (e.g. a tarball) from some regularly updated source copy somewhere if it's not there (based e.g. on some filename-scheme) convert the files to different format (but only if there aren't successfully converted ones there - e.g. from a previous attempt - custom comparison routine) for each file find a certain data element and fetch some additional file from say an URL, but only if that hasn't been downloaded yet (decide on existence of file and file "freshness") finally compute something (e.g. word count for something identifiable and store it in the database, but only if the DB does not have an entry for that exact ID yet) Observations: there are different stages each stage is usually simple to compute or implement in isolation each stage may be simple, but the data volume may be large each stage may produce a few errors each stage may have different signals, on when (re)processing is needed Requirements: builds should be interruptable and idempotent (== robust) when interrupted, already processed objects should be reused to speedup the next run data paths should be easy to adjust (simple syntax, nothing new to learn, internal dsl would be ok) some form of dependency graph, that describes the process would be nice for later visualizations should leverage existing programs, if possible I've done some research on make alternatives like rake and have worked a lot with ant and maven in the past. All these tools naturally focus on code and software build, not on data builds. A system we have in place now for a task similar to the above is pretty much just shell scripts, which are compact (and are a ok glue for a variety of other programs written in other languages), so I wonder if worse is better?

Read the article
SQL SERVER – Installing Data Quality Services (DQS) on SQL Server 2012

- by pinaldave

Data Quality Services is very interesting enhancements in SQL Server 2012. My friend and SQL Server Expert Govind Kanshi have written an excellent article on this subject earlier on his blog. Yesterday I stumbled upon his blog one more time and decided to experiment myself with DQS. I have basic understanding of DQS and MDS so I knew I need to start with DQS Client. However, when I tried to find DQS Client I was not able to find it under SQL Server 2012 installation. I quickly realized that I needed to separately install the DQS client. You will find the DQS installer under SQL Server 2012 >> Data Quality Services directory. The pre-requisite of DQS is Master Data Services (MDS) and IIS. If you have not installed IIS, you can follow the simple steps and install IIS in your machine. Once the pre-requisites are installed, click on MDS installer once again and it will install DQS just fine. Be patient with the installer as it can take a bit longer time if your machine is low on configurations. Once the installation is over you will be able to expand SQL Server 2012 >> Data Quality Services directory and you will notice that it will have a new item called Data Quality Client. Click on it and it will open the client. Well, in future blog post we will go over more details about DQS and detailed practical examples. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology Tagged: Data Quality Services

Read the article

Search Results

Search found 60622 results on 2425 pages for 'linked data'.

Page 13/2425 | < Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >

- by Krishnamoorthy

- by JKid314159

- by Paul

- by user329820

- by user1119900

- by user3712174

- by cibrax

- by user373312

- by Mike.Hallett(at)Oracle-BI&EPM

- by caroline.yu

- by dain.hansen

- by JoeMeeks

- by Gortron

- by Konsta

- by Aaron

- by Jean-Pierre Dijcks

- by Mike Heremans

- by Bruno-P

- by Dave

- by [email protected]

- by PreMagination

- by miku

- by pinaldave

< Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >