data storage - Page 15 - Developer IT

Are there sources of email marketing data available?

- by Gortron

Are sources of email marketing data available to the public? I would like to see email marketing data to see what kind of content a business sends out, the frequency of sending, the number of people emailed, especially the resulting open rates and click through rates. Are businesses willing to share data on their previous email marketing campaigns without divulging their contact list? I would like to use this data to create an application to help businesses create better newsletters by using this data as a benchmark, basically sharing what works and what doesn't for each industry.

Read the article

Storing Attendance Data in database

- by Ali Abbas

So i have to store daily attendance of employees of my organisation from my application . The part where I need some help is, the efficient way to store attendance data. After some research and brain storming I came up with some approaches . Could you point me out which one is the best and any unobvious ill effects of the mentioned approaches. The approaches are as follows Create a single table for whole organisation and store empid,date,presentstatus as a row for every employee everyday. Create a single table for whole organisation and store a single row for each day with a comma delimited string of empids which are absent. I will generate the string on my application. Create different tables for each department and follow the 1 method. Please share your views and do mention any other good methods

Read the article

Extending SSIS with custom Data Flow components (Presentation)

Download the slides and sample code from my Extending SSIS with custom Data Flow components presentation, first presented at the SQLBits II (The SQL) Community Conference. Abstract Get some real-world insights into developing data flow components for SSIS. This starts with an introduction to the data flow pipeline engine, and explains the real differences between adapters and the three sub-types of transformation. Understanding how the different types of component behave and manage data is key to writing components of your own, and probably should but be required knowledge for anyone building packages at all. Using sample code throughout, I will show you how to write components, as well as highlighting best practice and lessons learned. The sample code includes fully working example projects for source, destination and transformation components. Presentation & Samples (358KB) Extending SSIS with custom Data Flow components.zip

Read the article

How to use OO for data analysis? [closed]

- by Konsta

In which ways could object-orientation (OO) make my data analysis more efficient and let me reuse more of my code? The data analysis can be broken up into get data (from db or csv or similar) transform data (filter, group/pivot, ...) display/plot (graph timeseries, create tables, etc.) I mostly use Python and its Pandas and Matplotlib packages for this besides some DB connectivity (SQL). Almost all of my code is a functional/procedural mix. While I have started to create a data object for a certain collection of time series, I wonder if there are OO design patterns/approaches for other parts of the process that might increase efficiency?

Read the article

Markup format or script for data files?

- by Aaron

The game I'm designing will be mainly written in a high level scripting language (leaning towards either Lua or Squirrel) with a C++ core. In addition to scripts I'm also going to need different data files. Many data files will be for static information such as graphical assets and monster types. I'd also want to create and update data files at runtime for user information like option settings and game saves. Can I get away with using plain script files (i.e. .lua or .nut files) for my data files, or is it better to use dedicated markup formats like XML or YAML? If I use script files, loaded separately from my true scripts, then I wouldn't need an extra library to read those files. Scripting languages like Lua also have table syntax that lend themselves towards data definition. On the other hand I'd have to write my own schema check code. These languages also don't seem to support serialization "out of the box" like the markup format libraries do.

Read the article

What''s easy extensible technique to store game data?

- by Miro

I'm looking for library/technique for storing my game resources - levels, object (effects,world info), items(price,effects,...), NPC(visual info, behavior), everything except graphics/audio stuff. I've seen lua used for Awesome WM configuration. protobuf looks good, but it seems to be designed for network communication. I've tried to write my own parser, but as the project grows it's more and more harder to manage it and catch all the bugs. My requiremets: stability easy extension of data without need to convert older versions to newer good(don't have to be the best) performance of loading not much coding not XML!

Read the article

Winner of the 2012 Government Big Data Solutions Award

- by Jean-Pierre Dijcks

Hot off the press: The winner of the 2012 Government Big Data Solutions Aware is the National Cancer Institute!! Read all the details on CTOLabs.com. A short excerpt to wet your appetite: "... This solution, based on the Oracle Big Data Appliance with the Cloudera Distribution of Apache Hadoop (CDH), leverages capabilities available from the Big Data community today in pioneering ways that can serve a broad range of researchers. The promising approach of this solution is repeatable across many other Big Data challenges for bioinfomatics, making this approach worthy of its selection as the 2012 Government Big Data Solution Award." Read the entire post. Congrats to the entire team!!

Read the article

SQL Server and the XML Data Type : Data Manipulation

The introduction of the xml data type, with its own set of methods for processing xml data, made it possible for SQL Server developers to create columns and variables of the type xml. Deanna Dicken examines the modify() method, which provides for data manipulation of the XML data stored in the xml data type via XML DML statements. Too many SQL Servers to keep up with?Download a free trial of SQL Response to monitor your SQL Servers in just one intuitive interface."The monitoringin SQL Response is excellent." Mike Towery.

Read the article

Getting data from a webpage in a stable and efficient way

- by Mike Heremans

Recently I've learned that using a regex to parse the HTML of a website to get the data you need isn't the best course of action. So my question is simple: What then, is the best / most efficient and a generally stable way to get this data? I should note that: There are no API's There is no other source where I can get the data from (no databases, feeds and such) There is no access to the source files. (Data from public websites) Let's say the data is normal text, displayed in a table in a html page I'm currently using python for my project but a language independent solution/tips would be nice. As a side question: How would you go about it when the webpage is constructed by Ajax calls?

Read the article

What data structure to use / data persistence

- by Dave

I have an app where I need one table of information with the following fields: field 1 - int or char field 2 - string (max 10 char) field 3 - string (max 20 char) field 4 - float I need the program to filter on field 1 based upon a segmented control and select a field 2 from a picker. From this data I need to look up field 4 to use in a calculation. Total records will be about 200. I never see it go above 400 - 500. I am going to use a singleton which I am able to do, I just need help with the structure for this with data persistence. What type of data structure should I use for this and should I use NSNumber, NSString, etc. or old data types like float, Char, etc. I thought about a struct put into an array but there is probably a better way. This is new to me so any help or reference to examples would be great. I also thought about a plist or dictionary but it looks like it is just a lookup and a field which obviously won't work. Core data looked like overkill to me. Also, with any recommendation how should I get initial data into it? I want the user to be able to edit and add to the database. Sorry for the old terms, you can see what generation I am from... Thanks in advance!!!!

Read the article

Node.js Adventure - Storage Services and Service Runtime

- by Shaun

When I described on how to host a Node.js application on Windows Azure, one of questions might be raised about how to consume the vary Windows Azure services, such as the storage, service bus, access control, etc.. Interact with windows azure services is available in Node.js through the Windows Azure Node.js SDK, which is a module available in NPM. In this post I would like to describe on how to use Windows Azure Storage (a.k.a. WAS) as well as the service runtime. Consume Windows Azure Storage Let’s firstly have a look on how to consume WAS through Node.js. As we know in the previous post we can host Node.js application on Windows Azure Web Site (a.k.a. WAWS) as well as Windows Azure Cloud Service (a.k.a. WACS). In theory, WAWS is also built on top of WACS worker roles with some more features. Hence in this post I will only demonstrate for hosting in WACS worker role. The Node.js code can be used when consuming WAS when hosted on WAWS. But since there’s no roles in WAWS, the code for consuming service runtime mentioned in the next section cannot be used for WAWS node application. We can use the solution that I created in my last post. Alternatively we can create a new windows azure project in Visual Studio with a worker role, add the “node.exe” and “index.js” and install “express” and “node-sqlserver” modules, make all files as “Copy always”. In order to use windows azure services we need to have Windows Azure Node.js SDK, as knows as a module named “azure” which can be installed through NPM. Once we downloaded and installed, we need to include them in our worker role project and make them as “Copy always”. You can use my “Copy all always” tool mentioned in my last post to update the currently worker role project file. You can also find the source code of this tool here. The source code of Windows Azure SDK for Node.js can be found in its GitHub page. It contains two parts. One is a CLI tool which provides a cross platform command line package for Mac and Linux to manage WAWS and Windows Azure Virtual Machines (a.k.a. WAVM). The other is a library for managing and consuming vary windows azure services includes tables, blobs, queues, service bus and the service runtime. I will not cover all of them but will only demonstrate on how to use tables and service runtime information in this post. You can find the full document of this SDK here. Back to Visual Studio and open the “index.js”, let’s continue our application from the last post, which was working against Windows Azure SQL Database (a.k.a. WASD). The code should looks like this. 1: var express = require("express"); 2: var sql = require("node-sqlserver"); 3: 4: var connectionString = "Driver={SQL Server Native Client 10.0};Server=tcp:ac6271ya9e.database.windows.net,1433;Database=synctile;Uid=shaunxu@ac6271ya9e;Pwd={PASSWORD};Encrypt=yes;Connection Timeout=30;"; 5: var port = 80; 6: 7: var app = express(); 8: 9: app.configure(function () { 10: app.use(express.bodyParser()); 11: }); 12: 13: app.get("/", function (req, res) { 14: sql.open(connectionString, function (err, conn) { 15: if (err) { 16: console.log(err); 17: res.send(500, "Cannot open connection."); 18: } 19: else { 20: conn.queryRaw("SELECT * FROM [Resource]", function (err, results) { 21: if (err) { 22: console.log(err); 23: res.send(500, "Cannot retrieve records."); 24: } 25: else { 26: res.json(results); 27: } 28: }); 29: } 30: }); 31: }); 32: 33: app.get("/text/:key/:culture", function (req, res) { 34: sql.open(connectionString, function (err, conn) { 35: if (err) { 36: console.log(err); 37: res.send(500, "Cannot open connection."); 38: } 39: else { 40: var key = req.params.key; 41: var culture = req.params.culture; 42: var command = "SELECT * FROM [Resource] WHERE [Key] = '" + key + "' AND [Culture] = '" + culture + "'"; 43: conn.queryRaw(command, function (err, results) { 44: if (err) { 45: console.log(err); 46: res.send(500, "Cannot retrieve records."); 47: } 48: else { 49: res.json(results); 50: } 51: }); 52: } 53: }); 54: }); 55: 56: app.get("/sproc/:key/:culture", function (req, res) { 57: sql.open(connectionString, function (err, conn) { 58: if (err) { 59: console.log(err); 60: res.send(500, "Cannot open connection."); 61: } 62: else { 63: var key = req.params.key; 64: var culture = req.params.culture; 65: var command = "EXEC GetItem '" + key + "', '" + culture + "'"; 66: conn.queryRaw(command, function (err, results) { 67: if (err) { 68: console.log(err); 69: res.send(500, "Cannot retrieve records."); 70: } 71: else { 72: res.json(results); 73: } 74: }); 75: } 76: }); 77: }); 78: 79: app.post("/new", function (req, res) { 80: var key = req.body.key; 81: var culture = req.body.culture; 82: var val = req.body.val; 83: 84: sql.open(connectionString, function (err, conn) { 85: if (err) { 86: console.log(err); 87: res.send(500, "Cannot open connection."); 88: } 89: else { 90: var command = "INSERT INTO [Resource] VALUES ('" + key + "', '" + culture + "', N'" + val + "')"; 91: conn.queryRaw(command, function (err, results) { 92: if (err) { 93: console.log(err); 94: res.send(500, "Cannot retrieve records."); 95: } 96: else { 97: res.send(200, "Inserted Successful"); 98: } 99: }); 100: } 101: }); 102: }); 103: 104: app.listen(port); Now let’s create a new function, copy the records from WASD to table service. 1. Delete the table named “resource”. 2. Create a new table named “resource”. These 2 steps ensures that we have an empty table. 3. Load all records from the “resource” table in WASD. 4. For each records loaded from WASD, insert them into the table one by one. 5. Prompt to user when finished. In order to use table service we need the storage account and key, which can be found from the developer portal. Just select the storage account and click the Manage Keys button. Then create two local variants in our Node.js application for the storage account name and key. Since we need to use WAS we need to import the azure module. Also I created another variant stored the table name. In order to work with table service I need to create the storage client for table service. This is very similar as the Windows Azure SDK for .NET. As the code below I created a new variant named “client” and use “createTableService”, specified my storage account name and key. 1: var azure = require("azure"); 2: var storageAccountName = "synctile"; 3: var storageAccountKey = "/cOy9L7xysXOgPYU9FjDvjrRAhaMX/5tnOpcjqloPNDJYucbgTy7MOrAW7CbUg6PjaDdmyl+6pkwUnKETsPVNw=="; 4: var tableName = "resource"; 5: var client = azure.createTableService(storageAccountName, storageAccountKey); Now create a new function for URL “/was/init” so that we can trigger it through browser. Then in this function we will firstly load all records from WASD. 1: app.get("/was/init", function (req, res) { 2: // load all records from windows azure sql database 3: sql.open(connectionString, function (err, conn) { 4: if (err) { 5: console.log(err); 6: res.send(500, "Cannot open connection."); 7: } 8: else { 9: conn.queryRaw("SELECT * FROM [Resource]", function (err, results) { 10: if (err) { 11: console.log(err); 12: res.send(500, "Cannot retrieve records."); 13: } 14: else { 15: if (results.rows.length > 0) { 16: // begin to transform the records into table service 17: } 18: } 19: }); 20: } 21: }); 22: }); When we succeed loaded all records we can start to transform them into table service. First I need to recreate the table in table service. This can be done by deleting and creating the table through table client I had just created previously. 1: app.get("/was/init", function (req, res) { 2: // load all records from windows azure sql database 3: sql.open(connectionString, function (err, conn) { 4: if (err) { 5: console.log(err); 6: res.send(500, "Cannot open connection."); 7: } 8: else { 9: conn.queryRaw("SELECT * FROM [Resource]", function (err, results) { 10: if (err) { 11: console.log(err); 12: res.send(500, "Cannot retrieve records."); 13: } 14: else { 15: if (results.rows.length > 0) { 16: // begin to transform the records into table service 17: // recreate the table named 'resource' 18: client.deleteTable(tableName, function (error) { 19: client.createTableIfNotExists(tableName, function (error) { 20: if (error) { 21: error["target"] = "createTableIfNotExists"; 22: res.send(500, error); 23: } 24: else { 25: // transform the records 26: } 27: }); 28: }); 29: } 30: } 31: }); 32: } 33: }); 34: }); As you can see, the azure SDK provide its methods in callback pattern. In fact, almost all modules in Node.js use the callback pattern. For example, when I deleted a table I invoked “deleteTable” method, provided the name of the table and a callback function which will be performed when the table had been deleted or failed. Underlying, the azure module will perform the table deletion operation in POSIX async threads pool asynchronously. And once it’s done the callback function will be performed. This is the reason we need to nest the table creation code inside the deletion function. If we perform the table creation code after the deletion code then they will be invoked in parallel. Next, for each records in WASD I created an entity and then insert into the table service. Finally I send the response to the browser. Can you find a bug in the code below? I will describe it later in this post. 1: app.get("/was/init", function (req, res) { 2: // load all records from windows azure sql database 3: sql.open(connectionString, function (err, conn) { 4: if (err) { 5: console.log(err); 6: res.send(500, "Cannot open connection."); 7: } 8: else { 9: conn.queryRaw("SELECT * FROM [Resource]", function (err, results) { 10: if (err) { 11: console.log(err); 12: res.send(500, "Cannot retrieve records."); 13: } 14: else { 15: if (results.rows.length > 0) { 16: // begin to transform the records into table service 17: // recreate the table named 'resource' 18: client.deleteTable(tableName, function (error) { 19: client.createTableIfNotExists(tableName, function (error) { 20: if (error) { 21: error["target"] = "createTableIfNotExists"; 22: res.send(500, error); 23: } 24: else { 25: // transform the records 26: for (var i = 0; i < results.rows.length; i++) { 27: var entity = { 28: "PartitionKey": results.rows[i][1], 29: "RowKey": results.rows[i][0], 30: "Value": results.rows[i][2] 31: }; 32: client.insertEntity(tableName, entity, function (error) { 33: if (error) { 34: error["target"] = "insertEntity"; 35: res.send(500, error); 36: } 37: else { 38: console.log("entity inserted"); 39: } 40: }); 41: } 42: // send the 43: console.log("all done"); 44: res.send(200, "All done!"); 45: } 46: }); 47: }); 48: } 49: } 50: }); 51: } 52: }); 53: }); Now we can publish it to the cloud and have a try. But normally we’d better test it at the local emulator first. In Node.js SDK there are three build-in properties which provides the account name, key and host address for local storage emulator. We can use them to initialize our table service client. We also need to change the SQL connection string to let it use my local database. The code will be changed as below. 1: // windows azure sql database 2: //var connectionString = "Driver={SQL Server Native Client 10.0};Server=tcp:ac6271ya9e.database.windows.net,1433;Database=synctile;Uid=shaunxu@ac6271ya9e;Pwd=eszqu94XZY;Encrypt=yes;Connection Timeout=30;"; 3: // sql server 4: var connectionString = "Driver={SQL Server Native Client 11.0};Server={.};Database={Caspar};Trusted_Connection={Yes};"; 5: 6: var azure = require("azure"); 7: var storageAccountName = "synctile"; 8: var storageAccountKey = "/cOy9L7xysXOgPYU9FjDvjrRAhaMX/5tnOpcjqloPNDJYucbgTy7MOrAW7CbUg6PjaDdmyl+6pkwUnKETsPVNw=="; 9: var tableName = "resource"; 10: // windows azure storage 11: //var client = azure.createTableService(storageAccountName, storageAccountKey); 12: // local storage emulator 13: var client = azure.createTableService(azure.ServiceClient.DEVSTORE_STORAGE_ACCOUNT, azure.ServiceClient.DEVSTORE_STORAGE_ACCESS_KEY, azure.ServiceClient.DEVSTORE_TABLE_HOST); Now let’s run the application and navigate to “localhost:12345/was/init” as I hosted it on port 12345. We can find it transformed the data from my local database to local table service. Everything looks fine. But there is a bug in my code. If we have a look on the Node.js command window we will find that it sent response before all records had been inserted, which is not what I expected. The reason is that, as I mentioned before, Node.js perform all IO operations in non-blocking model. When we inserted the records we executed the table service insert method in parallel, and the operation of sending response was also executed in parallel, even though I wrote it at the end of my logic. The correct logic should be, when all entities had been copied to table service with no error, then I will send response to the browser, otherwise I should send error message to the browser. To do so I need to import another module named “async”, which helps us to coordinate our asynchronous code. Install the module and import it at the beginning of the code. Then we can use its “forEach” method for the asynchronous code of inserting table entities. The first argument of “forEach” is the array that will be performed. The second argument is the operation for each items in the array. And the third argument will be invoked then all items had been performed or any errors occurred. Here we can send our response to browser. 1: app.get("/was/init", function (req, res) { 2: // load all records from windows azure sql database 3: sql.open(connectionString, function (err, conn) { 4: if (err) { 5: console.log(err); 6: res.send(500, "Cannot open connection."); 7: } 8: else { 9: conn.queryRaw("SELECT * FROM [Resource]", function (err, results) { 10: if (err) { 11: console.log(err); 12: res.send(500, "Cannot retrieve records."); 13: } 14: else { 15: if (results.rows.length > 0) { 16: // begin to transform the records into table service 17: // recreate the table named 'resource' 18: client.deleteTable(tableName, function (error) { 19: client.createTableIfNotExists(tableName, function (error) { 20: if (error) { 21: error["target"] = "createTableIfNotExists"; 22: res.send(500, error); 23: } 24: else { 25: async.forEach(results.rows, 26: // transform the records 27: function (row, callback) { 28: var entity = { 29: "PartitionKey": row[1], 30: "RowKey": row[0], 31: "Value": row[2] 32: }; 33: client.insertEntity(tableName, entity, function (error) { 34: if (error) { 35: callback(error); 36: } 37: else { 38: console.log("entity inserted."); 39: callback(null); 40: } 41: }); 42: }, 43: // send reponse 44: function (error) { 45: if (error) { 46: error["target"] = "insertEntity"; 47: res.send(500, error); 48: } 49: else { 50: console.log("all done"); 51: res.send(200, "All done!"); 52: } 53: } 54: ); 55: } 56: }); 57: }); 58: } 59: } 60: }); 61: } 62: }); 63: }); Run it locally and now we can find the response was sent after all entities had been inserted. Query entities against table service is simple as well. Just use the “queryEntity” method from the table service client and providing the partition key and row key. We can also provide a complex query criteria as well, for example the code here. In the code below I queried an entity by the partition key and row key, and return the proper localization value in response. 1: app.get("/was/:key/:culture", function (req, res) { 2: var key = req.params.key; 3: var culture = req.params.culture; 4: client.queryEntity(tableName, culture, key, function (error, entity) { 5: if (error) { 6: res.send(500, error); 7: } 8: else { 9: res.json(entity); 10: } 11: }); 12: }); And then tested it on local emulator. Finally if we want to publish this application to the cloud we should change the database connection string and storage account. For more information about how to consume blob and queue service, as well as the service bus please refer to the MSDN page. Consume Service Runtime As I mentioned above, before we published our application to the cloud we need to change the connection string and account information in our code. But if you had played with WACS you should have known that the service runtime provides the ability to retrieve configuration settings, endpoints and local resource information at runtime. Which means we can have these values defined in CSCFG and CSDEF files and then the runtime should be able to retrieve the proper values. For example we can add some role settings though the property window of the role, specify the connection string and storage account for cloud and local. And the can also use the endpoint which defined in role environment to our Node.js application. In Node.js SDK we can get an object from “azure.RoleEnvironment”, which provides the functionalities to retrieve the configuration settings and endpoints, etc.. In the code below I defined the connection string variants and then use the SDK to retrieve and initialize the table client. 1: var connectionString = ""; 2: var storageAccountName = ""; 3: var storageAccountKey = ""; 4: var tableName = ""; 5: var client; 6: 7: azure.RoleEnvironment.getConfigurationSettings(function (error, settings) { 8: if (error) { 9: console.log("ERROR: getConfigurationSettings"); 10: console.log(JSON.stringify(error)); 11: } 12: else { 13: console.log(JSON.stringify(settings)); 14: connectionString = settings["SqlConnectionString"]; 15: storageAccountName = settings["StorageAccountName"]; 16: storageAccountKey = settings["StorageAccountKey"]; 17: tableName = settings["TableName"]; 18: 19: console.log("connectionString = %s", connectionString); 20: console.log("storageAccountName = %s", storageAccountName); 21: console.log("storageAccountKey = %s", storageAccountKey); 22: console.log("tableName = %s", tableName); 23: 24: client = azure.createTableService(storageAccountName, storageAccountKey); 25: } 26: }); In this way we don’t need to amend the code for the configurations between local and cloud environment since the service runtime will take care of it. At the end of the code we will listen the application on the port retrieved from SDK as well. 1: azure.RoleEnvironment.getCurrentRoleInstance(function (error, instance) { 2: if (error) { 3: console.log("ERROR: getCurrentRoleInstance"); 4: console.log(JSON.stringify(error)); 5: } 6: else { 7: console.log(JSON.stringify(instance)); 8: if (instance["endpoints"] && instance["endpoints"]["nodejs"]) { 9: var endpoint = instance["endpoints"]["nodejs"]; 10: app.listen(endpoint["port"]); 11: } 12: else { 13: app.listen(8080); 14: } 15: } 16: }); But if we tested the application right now we will find that it cannot retrieve any values from service runtime. This is because by default, the entry point of this role was defined to the worker role class. In windows azure environment the service runtime will open a named pipeline to the entry point instance, so that it can connect to the runtime and retrieve values. But in this case, since the entry point was worker role and the Node.js was opened inside the role, the named pipeline was established between our worker role class and service runtime, so our Node.js application cannot use it. To fix this problem we need to open the CSDEF file under the azure project, add a new element named Runtime. Then add an element named EntryPoint which specify the Node.js command line. So that the Node.js application will have the connection to service runtime, then it’s able to read the configurations. Start the Node.js at local emulator we can find it retrieved the connections, storage account for local. And if we publish our application to azure then it works with WASD and storage service through the configurations for cloud. Summary In this post I demonstrated how to use Windows Azure SDK for Node.js to interact with storage service, especially the table service. I also demonstrated on how to use WACS service runtime, how to retrieve the configuration settings and the endpoint information. And in order to make the service runtime available to my Node.js application I need to create an entry point element in CSDEF file and set “node.exe” as the entry point. I used five posts to introduce and demonstrate on how to run a Node.js application on Windows platform, how to use Windows Azure Web Site and Windows Azure Cloud Service worker role to host our Node.js application. I also described how to work with other services provided by Windows Azure platform through Windows Azure SDK for Node.js. Node.js is a very new and young network application platform. But since it’s very simple and easy to learn and deploy, as well as, it utilizes single thread non-blocking IO model, Node.js became more and more popular on web application and web service development especially for those IO sensitive projects. And as Node.js is very good at scaling-out, it’s more useful on cloud computing platform. Use Node.js on Windows platform is new, too. The modules for SQL database and Windows Azure SDK are still under development and enhancement. It doesn’t support SQL parameter in “node-sqlserver”. It does support using storage connection string to create the storage client in “azure”. But Microsoft is working on make them easier to use, working on add more features and functionalities. PS, you can download the source code here. You can download the source code of my “Copy all always” tool here. Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Read the article

Using a "white list" for extracting terms for Text Mining

- by [email protected]

In Part 1 of my post on "Generating cluster names from a document clustering model" (part 1, part 2, part 3), I showed how to build a clustering model from text documents using Oracle Data Miner, which automates preparing data for text mining. In this process we specified a custom stoplist and lexer and relied on Oracle Text to identify important terms. However, there is an alternative approach, the white list, which uses a thesaurus object with the Oracle Text CTXRULE index to allow you to specify the important terms. INTRODUCTIONA stoplist is used to exclude, i.e., black list, specific words in your documents from being indexed. For example, words like a, if, and, or, and but normally add no value when text mining. Other words can also be excluded if they do not help to differentiate documents, e.g., the word Oracle is ubiquitous in the Oracle product literature. One problem with stoplists is determining which words to specify. This usually requires inspecting the terms that are extracted, manually identifying which ones you don't want, and then re-indexing the documents to determine if you missed any. Since a corpus of documents could contain thousands of words, this could be a tedious exercise. Moreover, since every word is considered as an individual token, a term excluded in one context may be needed to help identify a term in another context. For example, in our Oracle product literature example, the words "Oracle Data Mining" taken individually are not particular helpful. The term "Oracle" may be found in nearly all documents, as with the term "Data." The term "Mining" is more unique, but could also refer to the Mining industry. If we exclude "Oracle" and "Data" by specifying them in the stoplist, we lose valuable information. But it we include them, they may introduce too much noise. Still, when you have a broad vocabulary or don't have a list of specific terms of interest, you rely on the text engine to identify important terms, often by computing the term frequency - inverse document frequency metric. (This is effectively a weight associated with each term indicating its relative importance in a document within a collection of documents. We'll revisit this later.) The results using this technique is often quite valuable. As noted above, an alternative to the subtractive nature of the stoplist is to specify a white list, or a list of terms--perhaps multi-word--that we want to extract and use for data mining. The obvious downside to this approach is the need to specify the set of terms of interest. However, this may not be as daunting a task as it seems. For example, in a given domain (Oracle product literature), there is often a recognized glossary, or a list of keywords and phrases (Oracle product names, industry names, product categories, etc.). Being able to identify multi-word terms, e.g., "Oracle Data Mining" or "Customer Relationship Management" as a single token can greatly increase the quality of the data mining results. The remainder of this post and subsequent posts will focus on how to produce a dataset that contains white list terms, suitable for mining. CREATING A WHITE LIST We'll leverage the thesaurus capability of Oracle Text. Using a thesaurus, we create a set of rules that are in effect our mapping from single and multi-word terms to the tokens used to represent those terms. For example, "Oracle Data Mining" becomes "ORACLEDATAMINING." First, we'll create and populate a mapping table called my_term_token_map. All text has been converted to upper case and values in the TERM column are intended to be mapped to the token in the TOKEN column. TERM TOKEN DATA MINING DATAMINING ORACLE DATA MINING ORACLEDATAMINING 11G ORACLE11G JAVA JAVA CRM CRM CUSTOMER RELATIONSHIP MANAGEMENT CRM ... Next, we'll create a thesaurus object my_thesaurus and a rules table my_thesaurus_rules: CTX_THES.CREATE_THESAURUS('my_thesaurus', FALSE); CREATE TABLE my_thesaurus_rules (main_term VARCHAR2(100), query_string VARCHAR2(400)); We next populate the thesaurus object and rules table using the term token map. A cursor is defined over my_term_token_map. As we iterate over the rows, we insert a synonym relationship 'SYN' into the thesaurus. We also insert into the table my_thesaurus_rules the main term, and the corresponding query string, which specifies synonyms for the token in the thesaurus. DECLARE cursor c2 is select token, term from my_term_token_map; BEGIN for r_c2 in c2 loop CTX_THES.CREATE_RELATION('my_thesaurus',r_c2.token,'SYN',r_c2.term); EXECUTE IMMEDIATE 'insert into my_thesaurus_rules values (:1,''SYN(' || r_c2.token || ', my_thesaurus)'')' using r_c2.token; end loop; END; We are effectively inserting the token to return and the corresponding query that will look up synonyms in our thesaurus into the my_thesaurus_rules table, for example: 'ORACLEDATAMINING' SYN ('ORACLEDATAMINING', my_thesaurus)At this point, we create a CTXRULE index on the my_thesaurus_rules table: create index my_thesaurus_rules_idx on my_thesaurus_rules(query_string) indextype is ctxsys.ctxrule; In my next post, this index will be used to extract the tokens that match each of the rules specified. We'll then compute the tf-idf weights for each of the terms and create a nested table suitable for mining.

Read the article

ADO.NET (WCF) Data Services Query Interceptor Hangs IIS

- by PreMagination

I have an ADO.NET Data Service that's supposed to provide read-only access to a somewhat complex database. Logically I have table-per-type (TPT) inheritance in my data model but the EDM doesn't implement inheritance. (Limitation of EF and navigation properties on derived types. STILL not fixed in EF4!) I can query my EDM directly (using a separate project) using a copy of the query I'm trying to run against the web service, results are returned within 10 seconds. Disabling the query interceptors I'm able to make the same query against the web service, results are returned similarly quickly. I can enable some of the query interceptors and the results are returned slowly, up to a minute or so later. Alternatively, I can enable all the query interceptors, expand less of the properties on the main object I'm querying, and results are returned in a similar period of time. (I've increased some of the timeout periods) Up til this point Sql Profiler indicates the slow-down is the database. (That's a post for a different day) But when I enable all my query interceptors and expand all the properties I'd like to have the IIS worker process pegs the CPU for 20 minutes and a query is never even made against the database. This implies to me that yes, my implementation probably sucks but regardless the Data Services "tier" is having an issue it shouldn't. WCF tracing didn't reveal anything interesting to my untrained eye. Details: Data model: Agent-Person-Student Student has a collection of referrals Students and referrals are private, queries against the web service should only return "your" students and referrals. This means Person and Agent need to be filtered too. Other entities (Agent-Organization-School) can be accessed by anyone who has authenticated. The existing security model is poorly suited to perform this type of filtering for this type of data access, the query interceptors are complicated and cause EF to generate some entertaining sql queries. Sample Interceptor [QueryInterceptor("Agents")] public Expression<Func<Agent, Boolean>> OnQueryAgents() { //Agent is a Person(1), Educator(2), Student(3), or Other Person(13); allow if scope permissions exist return ag => (ag.AgentType.AgentTypeId == 1 || ag.AgentType.AgentTypeId == 2 || ag.AgentType.AgentTypeId == 3 || ag.AgentType.AgentTypeId == 13) && ag.Person.OrganizationPersons.Count<OrganizationPerson>(op => op.Organization.ScopePermissions.Any<ScopePermission> (p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124) || op.Organization.HierarchyDescendents.Any<OrganizationsHierarchy>(oh => oh.AncestorOrganization.ScopePermissions.Any<ScopePermission> (p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124))) > 0; } The query interceptors for Person, Student, Referral are all very similar, ie they traverse multiple same/similar tables to look for ScopePermissions as above. Sample Query var referrals = (from r in service.Referrals .Expand("Organization/ParentOrganization") .Expand("Educator/Person/Agent") .Expand("Student/Person/Agent") .Expand("Student") .Expand("Grade") .Expand("ProblemBehavior") .Expand("Location") .Expand("Motivation") .Expand("AdminDecision") .Expand("OthersInvolved") where r.DateCreated >= coupledays && r.DateDeleted == null select r); Any suggestions or tips would be greatly associated, for fixing my current implementation or in developing a new one, with the caveat that the database can't be changed and that ultimately I need to expose a large portion of the database via a web service that limits data access to the data authorized for, for the purpose of data integration with multiple outside parties. THANK YOU!!!

Read the article

make-like build tools for data?

- by miku

Make is a standard tools for building software. But make decides whether a target needs to be regenerated by comparing file modification times. Are there any proven, preferably small tools that handle builds not for software but for data? Something that regenerates targets not only on mod times but on certain other properties (e.g. completeness). (Or alternatively some paper that describes such a tool.) As illustration: I'd like to automate the following process: get data (e.g. a tarball) from some regularly updated source copy somewhere if it's not there (based e.g. on some filename-scheme) convert the files to different format (but only if there aren't successfully converted ones there - e.g. from a previous attempt - custom comparison routine) for each file find a certain data element and fetch some additional file from say an URL, but only if that hasn't been downloaded yet (decide on existence of file and file "freshness") finally compute something (e.g. word count for something identifiable and store it in the database, but only if the DB does not have an entry for that exact ID yet) Observations: there are different stages each stage is usually simple to compute or implement in isolation each stage may be simple, but the data volume may be large each stage may produce a few errors each stage may have different signals, on when (re)processing is needed Requirements: builds should be interruptable and idempotent (== robust) when interrupted, already processed objects should be reused to speedup the next run data paths should be easy to adjust (simple syntax, nothing new to learn, internal dsl would be ok) some form of dependency graph, that describes the process would be nice for later visualizations should leverage existing programs, if possible I've done some research on make alternatives like rake and have worked a lot with ant and maven in the past. All these tools naturally focus on code and software build, not on data builds. A system we have in place now for a task similar to the above is pretty much just shell scripts, which are compact (and are a ok glue for a variety of other programs written in other languages), so I wonder if worse is better?

Read the article

SQL SERVER – Installing Data Quality Services (DQS) on SQL Server 2012

- by pinaldave

Data Quality Services is very interesting enhancements in SQL Server 2012. My friend and SQL Server Expert Govind Kanshi have written an excellent article on this subject earlier on his blog. Yesterday I stumbled upon his blog one more time and decided to experiment myself with DQS. I have basic understanding of DQS and MDS so I knew I need to start with DQS Client. However, when I tried to find DQS Client I was not able to find it under SQL Server 2012 installation. I quickly realized that I needed to separately install the DQS client. You will find the DQS installer under SQL Server 2012 >> Data Quality Services directory. The pre-requisite of DQS is Master Data Services (MDS) and IIS. If you have not installed IIS, you can follow the simple steps and install IIS in your machine. Once the pre-requisites are installed, click on MDS installer once again and it will install DQS just fine. Be patient with the installer as it can take a bit longer time if your machine is low on configurations. Once the installation is over you will be able to expand SQL Server 2012 >> Data Quality Services directory and you will notice that it will have a new item called Data Quality Client. Click on it and it will open the client. Well, in future blog post we will go over more details about DQS and detailed practical examples. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology Tagged: Data Quality Services

Read the article

Oracle Insurance Gets Innovative with Insurance Business Intelligence

- by nicole.bruns(at)oracle.com

Oracle Insurance announced yesterday the availability of Oracle Insurance Insight 7.0, an insurance-specific data warehouse and business intelligence (BI) system that transforms the traditional approach to BI by involving business users in the creation and maintenance."Rapid access to business intelligence is essential to compete and thrive in today's insurance industry," said Srini Venkatasantham, vice president, Product Strategy, Oracle Insurance. "The adaptive data modeling approach of Oracle Insurance Insight 7.0, combined with the insurance-specific data model, offers global insurance companies a faster, easier way to get the intelligence they need to make better-informed business decisions." New Features in Oracle Insurance 7.0 include:"Adaptive Data Modeling" via the new warehouse palette: Gives business users the power to configure lines of business via an easy-to-use warehouse palette tool. Oracle Insurance Insight then automatically creates data warehouse elements - such as line-specific database structures and extract-transform-load (ETL) processes -speeding up time-to-value for BI initiatives. Out-of-the-box insurance models or create-from-scratch option: Includes pre-built content and interfaces for six Property and Casualty (P&C) lines. Additionally, insurers can use the warehouse palette to deploy any and all P&C or General Insurance lines of business from scratch, helping insurers support operations in any country.Leverages Oracle technologies: In addition to Oracle Business Intelligence Enterprise Edition, the solution includes Oracle Database 11g as well as Oracle Data Integrator Enterprise Edition 11g, which delivers Extract, Load and Transform (E-L-T) architecture and eliminates the need for a separate transformation server. Additionally, the expanded Oracle technology infrastructure enables support for Oracle Exadata. Martina Conlon, a Principal with Novarica's Insurance practice, and author of Business Intelligence in Insurance: Current State, Challenges, and Expectations says, "The need for continued investment by insurers in business intelligence capabilities is widely understood, and the industry is acting. Arming the business intelligence implementation with predefined insurance specific content, and flexible and configurable technology will get these projects up and running faster."Learn moreTo see a demo of the Oracle Insurance Insight system, click hereTo read the press announcement, click here

Read the article

Social Analytics in your current data

- by Dan McGrath

By now everyone is aware of the massive boom in social-networking (Twitter, Facebook, LinkedIn) and obviously a big part of its business model revolves around being able to mine this data to create information that can be used to make money for someone. Gartner has identified 'Social Analytics' as one of the top 10 strategic technologies for 2011. Has anyone looked at their existing data structures to determine if they could extract a social graph and then perform further data mining against this? How does it fit in with your other strategic development strategies? What information are you trying to extract from the data? Take for example, a bank. They could conceivably determine a social graph through account relationships and transactions. Obviously there would be open edges on the graph where funds enter/leave the institute, but that shouldn't detract from the usefulness of the data. I'm looking for actual examples with the answers, as well as why/how they did it. References to other sites will be greatly appreciated. Note: I'm not at all referring to mining data out of actual social networks.

Read the article

USB drives not recognized all of a sudden

- by Siddharth

I have tried most of the advice on askubuntu and other sites, usb_storage enable to fdisk -l. But I am unable to find steps to get it working again. lsusb results Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 004 Device 002: ID 413c:3012 Dell Computer Corp. Optical Wheel Mouse Bus 005 Device 002: ID 413c:2105 Dell Computer Corp. Model L100 Keyboard Bus 001 Device 005: ID 8564:1000 dmseg | tail reports [ 69.567948] usb 1-4: USB disconnect, device number 4 [ 74.084041] usb 1-6: new high-speed USB device number 5 using ehci_hcd [ 74.240484] Initializing USB Mass Storage driver... [ 74.256033] scsi5 : usb-storage 1-6:1.0 [ 74.256145] usbcore: registered new interface driver usb-storage [ 74.256147] USB Mass Storage support registered. [ 74.257290] usbcore: deregistering interface driver usb-storage fdisk -l reports Device Boot Start End Blocks Id System /dev/sda1 * 2048 972656639 486327296 83 Linux /dev/sda2 972658686 976771071 2056193 5 Extended /dev/sda5 972658688 976771071 2056192 82 Linux swap / Solaris I think I need steps to install and get usb_storage module working. Edit : I tried sudo modprobe -v usb-storage reports sudo modprobe -v usb-storage insmod /lib/modules/3.2.0-48-generic-pae/kernel/drivers/usb/storage/usb-storage.ko Still no usb driver mounted. Nor does a device show up in /dev. Any step by step process to debug and fix this will be really helpful. Thanks.

Read the article

The Oldest Big Data Problem: Parsing Human Language

- by dan.mcclary

There's a new whitepaper up on Oracle Technology Network which details the use of Digital Reasoning Systems' Synthesys software on Oracle Big Data Appliance. Digital Reasoning's approach is inherently "big data friendly," as it leverages multiple components of the Hadoop ecosystem. Moreover, the paper addresses the oldest big data problem of them all: extracting knowledge from human text. You can find the paper here. From the Executive Summary: There is a wealth of information to be extracted from natural language, but that extraction is challenging. The volume of human language we generate constitutes a natural Big Data problem, while its complexity and nuance requires a particular expertise to model and mine. In this paper we illustrate the impressive combination of Oracle Big Data Appliance and Digital Reasoning Synthesys software. The combination of Synthesys and Big Data Appliance makes it possible to analyze tens of millions of documents in a matter of hours. Moreover, this powerful combination achieves four times greater throughput than conducting the equivalent analysis on a much larger cloud-deployed Hadoop cluster.

Read the article

Uses of persistent data structures in non-functional languages

- by Ray Toal

Languages that are purely functional or near-purely functional benefit from persistent data structures because they are immutable and fit well with the stateless style of functional programming. But from time to time we see libraries of persistent data structures for (state-based, OOP) languages like Java. A claim often heard in favor of persistent data structures is that because they are immutable, they are thread-safe. However, the reason that persistent data structures are thread-safe is that if one thread were to "add" an element to a persistent collection, the operation returns a new collection like the original but with the element added. Other threads therefore see the original collection. The two collections share a lot of internal state, of course -- that's why these persistent structures are efficient. But since different threads see different states of data, it would seem that persistent data structures are not in themselves sufficient to handle scenarios where one thread makes a change that is visible to other threads. For this, it seems we must use devices such as atoms, references, software transactional memory, or even classic locks and synchronization mechanisms. Why then, is the immutability of PDSs touted as something beneficial for "thread safety"? Are there any real examples where PDSs help in synchronization, or solving concurrency problems? Or are PDSs simply a way to provide a stateless interface to an object in support of a functional programming style?

Read the article

Translate report data export from RUEI into HTML for import into OpenOffice Calc Spreadsheets

- by [email protected]

A common question of users is, How to import the data from the automated data export of Real User Experience Insight (RUEI) into tools for archiving, dashboarding or combination with other sets of data.XML is well-suited for such a translation via the companion Extensible Stylesheet Language Transformations (XSLT). Basically XSLT utilizes XSL, a template on what to read from your input XML data file and where to place it into the target document. The target document can be anything you like, i.e. XHTML, CSV, or even a OpenOffice Spreadsheet, etc. as long as it is a plain text format.XML 2 OpenOffice.org SpreadsheetFor the XSLT to work as an OpenOffice.org Calc Import Filter:How to add an XML Import Filter to OpenOffice CalcStart OpenOffice.org Calc andselect Tools > XML Filter SettingsNew...Fill in the details as follows:Filter name: RUEI Import filterApplication: OpenOffice.org Calc (.ods)Name of file type: Oracle Real User Experience InsightFile extension: xmlSwitch to the transformation tab and enter/select the following leaving the rest untouchedXSLT for import: ruei_report_data_import_filter.xslPlease see at the end of this blog post for a download of the referenced file.Select RUEI Import filter from list and Test XSLTClick on Browse to selectTransform file: export.php.xmlOpenOffice.org Calc will transform and load the XML file you retrieved from RUEI in a human-readable format.You can now select File > Open... and change the filetype to open your RUEI exports directly in OpenOffice.org Calc, just like any other a native Spreadsheet format.Files of type: Oracle Real User Experience Insight (*.xml)File name: export.php.xml XML 2 XHTMLMost XML-powered browsers provides for inherent XSL Transformation capabilities, you only have to reference the XSLT Stylesheet in the head of your XML file. Then open the file in your favourite Web Browser, Firefox, Opera, Safari or Internet Explorer alike.<?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href="ruei_report_data_export_2_xhtml.xsl"?><report>You can find a patched example export from RUEI plus the above referenced XSL-Stylesheets here: export.php.xml - Example report data export from RUEI ruei_report_data_export_2_xhtml.xsl - RUEI to XHTML XSL Transformation Stylesheetruei_report_data_import_filter.xsl - OpenOffice.org XML import filter for RUEI report export data If you would like to do things like this on the command line you can use either Xalan or xsltproc.The basic command syntax for xsltproc is very simple:xsltproc -o output.file stylesheet.xslt inputfile.xmlYou can use this with the above two stylesheets to translate RUEI Data Exports into XHTML and/or OpenOffice.org Calc ODS-Format. Or you could write your own XSLT to transform into Comma separated Value lists.Please let me know what you think or do with this information in the comments below.Kind regards,Stefan ThiemeReferences used:OpenOffice XML Filter - Create XSLT filters for import and export - http://user.services.openoffice.org/en/forum/viewtopic.php?f=45&t=3490SUN OpenOffice.org XML File Format 1.0 - http://xml.openoffice.org/xml_specification.pdf

Read the article

Intel Rapid Storage Technology (pre-OS) driver installation

- by Nero theZero

My desktop machine is built on Gigabyte GA-Z87-UD3H and Gigabyte provides the latest driver for Intel Rapid Storage Technology (IRST), which I installed after installing the OS. Same goes for my Lenovo Thinkpad-T420. And for both machine, checking the controller device under the IDE ATA/ATAPI Controllers section in Device Manager I see the driver has been updated to the latest version. I set the SATA controller to AHCI from BIOS On the desktop machine I have one WD 2TB BLACK & one WD 3TB Green I don’t use RAID, & no chance of using in near future, but according to Intel IRST improves performance in single disk scenario too. Now I have the following questions – What is the actual purpose of IRST (pre-OS install) driver that doesn’t get served with a post-OS driver that I installed? There must be some difference, otherwise there wouldn’t be a pre-OS version of the driver. Right? In the pre-OS procedure (loading the drivers at OS-installation time) after successfully completing the OS installation, do I need that post-OS driver? Because after installing from that one I got a quick launch icon that runs the IRST configuration application. Where do get that after installing the pre-OS driver? As it is “pre-OS”, when I load it at OS-installation time, does it updates anything at BIOS level or anywhere other than HDD? That’s because I’m going to dual boot Windows 7 with Windows 8.1, and after installing Windows 7 when I install Windows 8.1 & load the IRST driver for that, is there any chance of any “overwriting” or OS-incompatibility? In short, is there anything specific to follow while installing the second OS?

Read the article

Backup data rate on Raspberry Pi maxing out at 5 Mb/s. Why?

- by bastibe

I set up my Raspberry Pi as a Time Machine, as documented here. At the moment, the Raspberry Pi is connected to my MacBook Pro using a direct Ethernet cable. Also, an external hard drive (laptop drive) is connected to the Raspberry Pi using the USB port. However, backups are pretty slow. Activity Monitor claims that the Network is transferring a very steady 5 Mb/s, where my Time Capsule is transferring up to 8 Mb/s with a lot of fluctuation. The Raspberry Pi self-reports (top) that its CPU is only half-used, with about equal parts afpd, usb-storage and jbd2/sda1-8. Thus, I think that the processing power of the Raspberry Pi does not seem to be the problem here. To me, this looks like there is some kind of bottleneck that maxes out at 5 Mb/s thus potentially having my backups run at less than their potential speed. To the best of my knowledge, this might be the afp-daemon, the usb-bus or the external hard drive. So, my question is, how could I identify the true culprit and what can I do about it?

Read the article

Is basing storage requirements based on IOPS sufficient?

- by Boden

The current system in question is running SBS 2003, and is going to be migrated on new hardware to SBS 2008. Currently I'm seeing on average 200-300 disk transfers per second total across all the arrays in the system. The array seeing the bulk of activity is a 6 disk 7200RPM RAID 6 and it struggles to keep up during high traffic times (idle time often only 10-20%; response times peaking 20-50+ ms). Based on some rough calculations this makes sense (avg ~245 IOPS on this array at 70/30 read to write ratio). I'm considering using a much simpler disk configuration using a single RAID 10 array of 10K disks. Using the same parameters for my calculations above, I'm getting 583 average random IOPS / sec. Granted SBS 2008 is not the same beast as 2003, but I'd like to make the assumption that it'll be similar in terms of disk performance, if not better (Exchange 2007 is easier on the disk and there's no ISA server). Am I correct in believing that the proposed system will be sufficient in terms of performance, or am I missing something? I've read so much about recommended disk configurations for various products like Exchange, and they often mention things like dedicating spindles to logs, etc. I understand the reasoning behind this, but if I've got more than enough random I/O overhead, does it really matter? I've always at the very least had separate spindles for the OS, but I could really reduce cost and complexity if I just had a single, good performing array. So as not to make you guys do my job for me, the generic version of this question is: if I have a projected IOPS figure for a new system, is it sufficient to use this value alone to spec the storage, ignoring "best practice" configurations? (given similar technology, not going from DAS to SAN or anything)

Read the article

Data Auditor by Example

- by Jinjin.Wang

OWB has a node Data Auditors under Oracle Module in Projects Navigator. What is data auditor and how to use it? I will give an introduction to data auditor and show its usage by examples. Data auditor is an important tool in ensuring that data quality levels meet business requirements. Data auditor validates data against a set of data rules to determine which records comply and which do not. It gathers statistical metrics on how well the data in a system complies with a rule by auditing and marking how many errors are occurring against the audited table. Data auditors are typically scheduled for regular execution as part of a process flow, to monitor the quality of the data in an operational environment such as a data warehouse or ERP system, either immediately after updates like data loads, or at regular intervals. How to use data auditor to monitor data quality? Only objects with data rules can be monitored, so the first step is to define data rules according to business requirements and apply them to the objects you want to monitor. The objects can be tables, views, materialized views, and external tables. Secondly create a data auditor containing the objects. You can configure the data auditor and set physical deployment parameters for it as optional, which will be used while running the data auditor. Then deploy and run the data auditor either manually or as part of the process flow. After execution, the data auditor sets several output values, and records that are identified as not complying with the defined data rules contained in the data auditor are written to error tables. Here is an example. We have two tables DEPARTMENTS and EMPLOYEES (see pic-1 and pic-2. Click here for DDL and data) imported into OWB. We want to gather statistical metrics on how well data in these two tables satisfies the following requirements: a. Values of the EMPLOYEES.EMPLOYEE_ID attribute are three-digit numbers. b. Valid values for EMPLOYEES.JOB_ID are IT_PROG, SA_REP, SH_CLERK, PU_CLERK, and ST_CLERK. c. EMPLOYEES.EMPLOYEE_ID is related to DEPARTMENTS.MANAGER_ID. Pic-1 EMPLOYEES Pic-2 DEPARTMENTS 1. To determine legal data within EMPLOYEES or legal relationships between data in different columns of the two tables, firstly we define data rules based on the three requirements and apply them to tables. a. The first requirement is about patterns that an attribute is allowed to conform to. We create a Domain Pattern List data rule EMPLOYEE_PATTERN_RULE here. The pattern is defined in the Oracle Database regular expression syntax as ^([0-9]{3})$ Apply data rule EMPLOYEE_PATTERN_RULE to table EMPLOYEES.

Search Results

Search found 61623 results on 2465 pages for 'data storage'.

Page 15/2465 | < Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22 | Next Page >

- by Gortron

- by Ali Abbas

- by Konsta

- by Aaron

- by Miro

- by Jean-Pierre Dijcks

- by Mike Heremans

- by Dave

- by Shaun

- by [email protected]

- by PreMagination

- by miku

- by pinaldave

- by nicole.bruns(at)oracle.com

- by Dan McGrath

- by Siddharth

- by dan.mcclary

- by Ray Toal

- by [email protected]

- by Nero theZero

- by bastibe

- by Boden

- by Jinjin.Wang

< Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22 | Next Page >