Search Results

Search found 58783 results on 2352 pages for 'data modeling'.

Page 19/2352 | < Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26  | Next Page >

  • Isolating test data in acceptance tests

    - by Matt Phillips
    I'm looking for guidance on how to keep my acceptance tests isolated. Right now the issue I'm having with being able to run the tests in parallel is the database records that are manipulated in the tests. I've written helpers that take care of doing inserts and deletes before tests are executed, to make sure the state is correct. But now I can't run them in parallel against the same database without uniquely generating the test data fields for each test. For example. Testing creating a row i'll delete everything where column A = foo and column B = bar Then I'll navigate through the UI in the test and create a record with column A = foo and column B = bar. Testing that a duplicate row is not allowed to be created. I'll insert a row with column A = foo and column B = bar and then use the UI to try and do the exact same thing. This will display an error message in the UI as expected. These tests work perfectly when ran separately and serially. But I can't run them at the same time for fear that one will create or delete a record the other is expecting. Any tips on how to structure them better so they can be run in parallel?

    Read the article

  • Space-efficient data structures for broad-phase collision detection

    - by Marian Ivanov
    As far as I know, these are three types of data structures that can be used for collision detection broadphase: Unsorted arrays: Check every object againist every object - O(n^2) time; O(log n) space. It's so slow, it's useless if n isn't really small. for (i=1;i<objects;i++){ for(j=0;j<i;j++) narrowPhase(i,j); }; Sorted arrays: Sort the objects, so that you get O(n^(2-1/k)) for k dimensions O(n^1.5) for 2d and O(n^1.67) for 3d and O(n) space. Assuming the space is 2D and sortedArray is sorted so that if the object begins in sortedArray[i] and another object ends at sortedArray[i-1]; they don't collide Heaps of stacks: Divide the objects between a heap of stacks, so that you only have to check the bucket, its children and its parents - O(n log n) time, but O(n^2) space. This is probably the most frequently used approach. Is there a way of having O(n log n) time with less space? When is it more efficient to use sorted arrays over heaps and vice versa?

    Read the article

  • Data Structure for Small Number of Agents in a Relatively Big 2D World

    - by Seçkin Savasçi
    I'm working on a project where we will implement a kind of world simulation where there is a square 2D world. Agents live on this world and make decisions like moving or replicating themselves based on their neighbor cells(world=grid) and some extra parameters(which are not based on the state of the world). I'm looking for a data structure to implement such a project. My concerns are : I will implement this 3 times: sequential, using OpenMP, using MPI. So if I can use the same structure that will be quite good. The first thing comes up is keeping a 2D array for the world and storing agent references in it. And simulate the world for each time slice by checking every cell in each iteration and further processing if an agents is found in the cell. The downside is what if I have 1000x1000 world and only 5 agents in it. It will be an overkill for both sequential and parallel versions to check each cell and look for possible agents in them. I can use quadtree and store agents in it, but then how can I get the information about neighbor cells then? Please let me know if I should elaborate more.

    Read the article

  • Ubuntu Tools for recovering data from damaged USB Flash Drive ~ 10 Gb

    - by PREDA LUCIAN
    I have technical issues with my USB Flash Drive - JetFlash®V15 (TS16GJFV15) It's very critical situation because I can not see the data from it and I should get a way to recover them ASAP. So, in general, I have connected Non-stop that USB Flash Disk at my laptop. Was appear Power surges and when I was coming back, I saw that problem with it. Details regarding JetFlash®V15 (in present): - when I connect it on USP slot, the led is working intermittent and later on remain with constant light. - if I inspect the computer drivers, I found "Generic USB Flash Disk" (when the stick it's connected). - if I inspect "Properties", I can see next details: --- Type: unknown (application/octet-stream) --- Size: unknown --- Volume: unknown --- Accessed: unknown --- Modified: unknown I inspected that stick on 2 different computers (as well in different different USB Ports) and was the same problem, I can not see the content. I was checking with Windows 7 and Ubuntu 10.04 OS, but without success. With both OS was working before this issue. I'll appreciate an answer which will solve the problem, not an answer which will certify the problem. What I have to do, to recover the information form it (nearly 10 Gb)? I'm looking forward to be guided from a technical expert.

    Read the article

  • How much information can you mine out of a name?

    - by Finglas Fjorn
    While not directly related to programming, I figured that the programmers on here would be just as curious as I was about this question. Feel free to close the question if it does not meet with the guidelines. A name: first, possibly a middle, and surname. I'm curious about how much information you can mine out of a name, using publicly available datasets. I know that you can get the following with anywhere between a low-high probability (depending on the input) using US census data: 1) Gender. 2) Race. Facebook for instance, used exactly that to find out, with a decent level of accuracy, the racial distribution of users of their site (https://www.facebook.com/note.php?note_id=205925658858). What else can be mined? I'm not looking for anything specific, this is a very open-ended question to assuage my curiousity. My examples are US specific, so we'll assume that the name is the name of someone located in the US; but, if someone knows of publicly available datasets for other countries, I'm more than open to them too. I hope this is an interesting question!

    Read the article

  • Data structure for grid with negative indeces

    - by The Secret Imbecile
    Sorry if this is an insultingly obvious concept, but it's something I haven't done before and I've been unable to find any material discussing the best way to approach it. I'm wondering what's the best data structure for holding a 2D grid of unknown size. The grid has integer coordinates (x,y), and will have negative indices in both directions. So, what is the best way to hold this grid? I'm programming in c# currently, so I can't have negative array indices. My initial thought was to have class with 4 separate arrays for (+x,+y),(+x,-y),(-x,+y), and (-x,-y). This seems to be a valid way to implement the grid, but it does seem like I'm over-engineering the solution, and array resizing will be a headache. Another idea was to keep track of the center-point of the array and set that as the topological (0,0), however I would have the issue of having to do a shift to every element of the grid when repeatedly adding to the top-left of the grid, which would be similar to grid resizing though in all likelihood more frequent. Thoughts?

    Read the article

  • Data model for unlockable card collection

    - by Karlos Zafra
    My game has a collection of cards (about 64) that are unlocked (one by one) provided that 4 items are collected/gained in the game. Right now the deck of cards is modeled with a plist file, like this: <plist> <dict> <key>card1</key> <dict> <key>available</key> <true/> <key>series</key> <integer>1</integer> <key>model</key> <integer>1</integer> <key>color</key> <integer>1</integer> </dict> <key>card2</key> <dict> <key>available</key> <false/> <key>series</key> <integer>1</integer> <key>model</key> <integer>1</integer> <key>color</key> <integer>2</integer> </dict> The value for the key named "available" marks if the item is locked or not. Right now I have to include the data for the items collected by the user, but including that inside the plist looks like too much hassle. What kind of structure should be more suitable for this? How do you manage a collection of unlockable items like this?

    Read the article

  • Big Data – Interacting with Hadoop – What is PIG? – What is PIG Latin? – Day 16 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the HIVE in Big Data Story. In this article we will understand what is PIG and PIG Latin in Big Data Story. Yahoo started working on Pig for their application deployment on Hadoop. The goal of Yahoo to manage their unstructured data. What is Pig and What is Pig Latin? Pig is a high level platform for creating MapReduce programs used with Hadoop and the language we use for this platform is called PIG Latin. The pig was designed to make Hadoop more user-friendly and approachable by power-users and nondevelopers. PIG is an interactive execution environment supporting Pig Latin language. The language Pig Latin has supported loading and processing of input data with series of transforming to produce desired results. PIG has two different execution environments 1) Local Mode – In this case all the scripts run on a single machine. 2) Hadoop – In this case all the scripts run on Hadoop Cluster. Pig Latin vs SQL Pig essentially creates set of map and reduce jobs under the hoods. Due to same users does not have to now write, compile and build solution for Big Data. The pig is very similar to SQL in many ways. The Ping Latin language provide an abstraction layer over the data. It focuses on the data and not the structure under the hood. Pig Latin is a very powerful language and it can do various operations like loading and storing data, streaming data, filtering data as well various data operations related to strings. The major difference between SQL and Pig Latin is that PIG is procedural and SQL is declarative. In simpler words, Pig Latin is very similar to SQ Lexecution plan and that makes it much easier for programmers to build various processes. Whereas SQL handles trees naturally, Pig Latin follows directed acyclic graph (DAG). DAGs is used to model several different kinds of structures in mathematics and computer science. DAG Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Zookeeper. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Raid 5 mdadm Problem - Help Please

    - by user66260
    My Raid 5 array (4 1tb Disks WD10EARS) had was showing as degraded. I looked and one of the disks wasnt installed, so i re-added it with the mdadm add command. the array is now showing as (null)Array , but cant be mounted if i run: root@warren-P5K-E:/home/warren# sudo mdadm --misc --detail /dev/md0 I get: mdadm: cannot open /dev/md0: No such file or directory and running: root@warren-P5K-E:/home/warren# cat /proc/mdstat gives me: Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] unused devices: < none > The data is very important root@warren-P5K-E:/home/warren# mdadm --examine /dev/sda /dev/sda: Magic : a92b4efc Version : 0.90.00 UUID : 00000000:00000000:00000000:00000000 Creation Time : Sat May 26 12:08:14 2012 Raid Level : -unknown- Raid Devices : 0 Total Devices : 4 Preferred Minor : 0 Update Time : Sat May 26 12:08:40 2012 State : active Active Devices : 0 Working Devices : 4 Failed Devices : 0 Spare Devices : 4 Checksum : 82d5b792 - correct Events : 1 Number Major Minor RaidDevice State this 1 8 0 1 spare /dev/sda 0 0 8 16 0 spare /dev/sdb 1 1 8 0 1 spare /dev/sda 2 2 8 32 2 spare /dev/sdc 3 3 8 48 3 spare /dev/sdd root@warren-P5K-E:/home/warren# mdadm --examine /dev/sdb /dev/sdb: Magic : a92b4efc Version : 0.90.00 UUID : 00000000:00000000:00000000:00000000 Creation Time : Sat May 26 12:08:14 2012 Raid Level : -unknown- Raid Devices : 0 Total Devices : 4 Preferred Minor : 0 Update Time : Sat May 26 12:08:40 2012 State : active Active Devices : 0 Working Devices : 4 Failed Devices : 0 Spare Devices : 4 Checksum : 82d5b7a0 - correct Events : 1 Number Major Minor RaidDevice State this 0 8 16 0 spare /dev/sdb 0 0 8 16 0 spare /dev/sdb 1 1 8 0 1 spare /dev/sda 2 2 8 32 2 spare /dev/sdc 3 3 8 48 3 spare /dev/sdd root@warren-P5K-E:/home/warren# oot@warren-P5K-E:/home/warren# mdadm --examine /dev/sdc /dev/sdc: Magic : a92b4efc Version : 0.90.00 UUID : 00000000:00000000:00000000:00000000 Creation Time : Sat May 26 12:08:14 2012 Raid Level : -unknown- Raid Devices : 0 Total Devices : 4 Preferred Minor : 0 Update Time : Sat May 26 12:08:40 2012 State : active Active Devices : 0 Working Devices : 4 Failed Devices : 0 Spare Devices : 4 Checksum : 82d5b7b4 - correct Events : 1 Number Major Minor RaidDevice State this 2 8 32 2 spare /dev/sdc 0 0 8 16 0 spare /dev/sdb 1 1 8 0 1 spare /dev/sda 2 2 8 32 2 spare /dev/sdc 3 3 8 48 3 spare /dev/sdd root@warren-P5K-E:/home/warren# mdadm --examine /dev/sdd /dev/sdd: Magic : a92b4efc Version : 0.90.00 UUID : 00000000:00000000:00000000:00000000 Creation Time : Sat May 26 12:08:14 2012 Raid Level : -unknown- Raid Devices : 0 Total Devices : 4 Preferred Minor : 0 Update Time : Sat May 26 12:08:40 2012 State : active Active Devices : 0 Working Devices : 4 Failed Devices : 0 Spare Devices : 4 Checksum : 82d5b7c6 - correct Events : 1 Number Major Minor RaidDevice State this 3 8 48 3 spare /dev/sdd 0 0 8 16 0 spare /dev/sdb 1 1 8 0 1 spare /dev/sda 2 2 8 32 2 spare /dev/sdc 3 3 8 48 3 spare /dev/sdd That on the 4 drives.

    Read the article

  • Any simple approaches for managing customer data change requests for global reference files?

    - by Kelly Duke
    For the first time, I am developing in an environment in which there is a central repository for a number of different industry standard reference data tables and many different customers who need to select records from these industry standard reference data tables to fill in foreign key information for their customer specific records. Because these industry standard reference files are utilized by all customers, I want to reserve Create/Update/Delete access to these records for global product administrators. However, I would like to implement a (semi-)automated interface by which specific customers could request record additions, deletions or modifications to any of the industry standard reference files that are shared among all customers. I know I need something like a "data change request" table specifying: user id, user request datetime, request type (insert, modify, delete), a user entered text explanation of the change request, the user request's current status (pending, declined, completed), admin resolution datetime, admin id, an admin entered text description of the resolution, etc. What I can't figure out is how to elegantly handle the fact that these data change requests could apply to dozens of different tables with differing table column definitions. I would like to give the customer users making these data change requests a convenient way to enter their proposed record additions/modifications directly into CRUD screens that look very much like the reference table CRUD screens they don't have write/delete permissions for (with an additional text explanation and perhaps request priority field). I would also like to give the global admins a tool that allows them to view all the outstanding data change requests for the users they oversee sorted by date requested or user/date requested. Upon selecting a data change request record off the list, the admin would be directed to another CRUD screen that would be populated with the fields the customer users requested for the new/modified industry standard reference table record along with customer's text explanation, the request status and the text resolution explanation field. At this point the admin could accept/edit/reject the requested change and if accepted the affected industry standard reference file would be automatically updated with the appropriate fields and the data change request record's status, text resolution explanation and resolution datetime would all also be appropriately updated. However, I want to keep the actual production reference tables as simple as possible and free from these extraneous and typically null customer change request fields. I'd also like the data change request file to aggregate all data change requests across all the reference tables yet somehow "point to" the specific reference table and primary key in question for modification & deletion requests or the specific reference table and associated customer user entered field values in question for record creation requests. Does anybody have any ideas of how to design something like this effectively? Is there a cleaner, simpler way I am missing? Thank you so much for reading.

    Read the article

  • Specification, modeling and programming are principially the same, right?

    - by Gabriel Šcerbák
    In formal specifications based on abstract algebraic types and equational theory you use formulas of equational theory to specify theory. System which will satisfy those constraints is called in formal logic a model. Modeling is process of creating a model, which abstracts of some aspects, which are unnecessary details for a specific case. So concrete system has to adhere to created model in observed aspects. Programming is a process of creating a program which will have specific behaviour - will perform specific algorithms - and programming languages through different paradigms enable us to think in a certain specific way, which abstracts of some details, usually machine specific ones. So could we be doing all those things at the same time, because they are principially the same? Is declarative programming the nearest attempt to do that? Could we use some sort f programming languages which will be good for programming as well as for modeling and specification?

    Read the article

  • Deploying BAM Data Control Application to WLS server

    - by [email protected]
    var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); try { var pageTracker = _gat._getTracker("UA-15829414-1"); pageTracker._trackPageview(); } catch(err) {} Typically we would test our ADF pages that use BAM Data control using integrated wls server (ADRS). If we have to deploy this same application to a standalone WLS we have to make sure we have the BAM server connection created in WLS.unless we do that we may face runtime errors.In Development mode of WLS(Reference) For development-mode WebLogic Server, you can set the mode to OVERWRITE to test user names and passwords. You can set the mode by running setDomainEnv.cmd or setDomainEnv.sh with the following option added to the command. Add the following to the JAVA_PROPERTIES entry in the <FMW_HOME>/user_projects/domains/<yourdomain>/bin/setDomainEnv.sh file: -Djps.app.credential.overwrite.allowed=true In Production mode of WLS Enable MDS Create and/or Register your MDS repository. For more details refer this Edit adf-config.xml from your application and add the following tag <adf-mds-config xmlns="http://xmlns.oracle.com/adf/mds/config">     <mds-config version="11.1.1.000">     <persistence-config>   <metadata-store-usages>     <metadata-store-usage default-cust-store="true" deploy-target="true" id="myRepos">     </metadata-store-usage>   </metadata-store-usages>   </persistence-config>           </mds-config>  </adf-mds-config>Deploy the application to WLS server after picking the appropriate repository during deployment from the MDS Repository dialog that pops up Enterprise Manager (Use these steps if using a version prior to 11gR1 PS1 release of JDeveloper) Go to EM (http://<host>:<port>/EMIn the left pane, deployments select Application1(your application)In the right pane, top dropdown select "System Mbean Browser->oracle.adf.share.connections->Server: AdminServer->Server: AdminServer->Application:<Appname>->ADFConnections"Right pane click "Operations->CreateConnection"Enter Connection Type as "BAMConnection"Enter the connection name same as the one defined in JdevClick "Invoke"Click "Return"Click on Operation->SaveNow in the ADFConnections in the navigator, select the connection just created and enter all the configuration details.Save and run the page. Enterprise Manager (Use these steps or the steps above if using 11gR1 PS1 or newer) Go to EM (http://<host>:<port>/EMIn the left pane, deployments select Application1(your application)In the right pane, click on "Application Deployment" to invoke to dropdown. In that select "ADF -> Configure ADF Connections"Select Connection Type as "BAM" from the drop downEnter Connection Type as to be the same as the one defined in JDevClick on "Create Connection". This should add a new row below under "BAM Connections"Select the new connection and click on the "Edit" icon. This should bring up a dialogSpecific appropriate values for all connection parameters - Username, password, BAM Server Host, BAM Server Port, Webtier Server Host, Webtier Server Port and BAM Webtier Protocol - and then click on OK to dismiss the dialogClick on "Apply"Run the page page.

    Read the article

  • Folders in SQL Server Data Tools

    - by jamiet
    Recently I have begun a new project in which I am using SQL Server Data Tools (SSDT) and SQL Server Integration Services (SSIS) 2012. Although I have been using SSDT & SSIS fairly extensively while SQL Server 2012 was in the beta phase I usually find that you don’t learn about the capabilities and quirks of new products until you use them on a real project, hence I am hoping I’m going to have a lot of experiences to share on my blog over the coming few weeks. In this first such blog post I want to talk about file and folder organisation in SSDT. The predecessor to SSDT is Visual Studio Database Projects. When one created a new Visual Studio Database Project a folder structure was provided with “Schema Objects” and “Scripts” in the root and a series of subfolders for each schema: Apparently a few customers were not too happy with the tool arbitrarily creating lots of folders in Solution Explorer and hence SSDT has gone in completely the opposite direction; now no folders are created and new objects will get created in the root – it is at your discretion where they get moved to: After using SSDT for a few weeks I can safely say that I preferred the older way because I never used Solution Explorer to navigate my schema objects anyway so it didn’t bother me how many folders it created. Having said that the thought of a single long list of files in Solution Explorer without any folders makes me shudder so on this project I have been manually creating folders in which to organise files and I have tried to mimic the old way as much as possible by creating two folders in the root, one for all schema objects and another for Pre/Post deployment scripts: This works fine until different developers start to build their own different subfolder structures; if you are OCD-inclined like me this is going to grate on you eventually and hence you are going to want to move stuff around so that you have consistent folder structures for each schema and (if you have multiple databases) each project. Moreover new files get created with a filename of the object name + “.sql” and often people like to have an extra identifier in the filename to indicate the object type: The overall point is this – files and folders in your solution are going to change. Some version control systems (VCSs) don’t take kindly to files being moved around or renamed because they recognise the renamed/moved file simply as a new file and when they do that you lose the revision history which, to my mind, is one of the key benefits of using a VCS in the first place. On this project we have been using Team Foundation Server (TFS) and while it pains me to say it (as I am no great fan of TFS’s version control system) it has proved invaluable when dealing with the SSDT problems that I outlined above because it is integrated right into the Visual Studio IDE. Thus the advice from this blog post is: If you are using SSDT consider using an Visual-Studio-integrated VCS that can easily handle file renames and file moves I suspect that fans of other VCSs will counter by saying that their VCS weapon of choice can handle renames/file moves quite satisfactorily and if that’s the case…great…let me know about them in the comments. This blog post is not an attempt to make people use one particular VCS, only to make people aware of this issue that might rise when using SSDT. More to come in the coming few weeks! @jamiet

    Read the article

  • Design Pattern for Complex Data Modeling

    - by Aaron Hayman
    I'm developing a program that has a SQL database as a backing store. As a very broad description, the program itself allows a user to generate records in any number of user-defined tables and make connections between them. As for specs: Any record generated must be able to be connected to any other record in any other user table (excluding itself...the record, not the table). These "connections" are directional, and the list of connections a record has is user ordered. Moreover, a record must "know" of connections made from it to others as well as connections made to it from others. The connections are kind of the point of this program, so there is a strong possibility that the number of connections made is very high, especially if the user is using the software as intended. A record's field can also include aggregate information from it's connections (like obtaining average, sum, etc) that must be updated on change from another record it's connected to. To conserve memory, only relevant information must be loaded at any one time (can't load the entire database in memory at load and go from there). I cannot assume the backing store is local. Right now it is, but eventually this program will include syncing to a remote db. Neither the user tables, connections or records are known at design time as they are user generated. I've spent a lot of time trying to figure out how to design the backing store and the object model to best fit these specs. In my first design attempt on this, I had one object managing all a table's records and connections. I attempted this first because it kept the memory footprint smaller (records and connections were simple dicts), but maintaining aggregate and link information between tables became....onerous (ie...a huge spaghettified mess). Tracing dependencies using this method almost became impossible. Instead, I've settled on a distributed graph model where each record and connection is 'aware' of what's around it by managing it own data and connections to other records. Doing this increases my memory footprint but also let me create a faulting system so connections/records aren't loaded into memory until they're needed. It's also much easier to code: trace dependencies, eliminate cycling recursive updates, etc. My biggest problem is storing/loading the connections. I'm not happy with any of my current solutions/ideas so I wanted to ask and see if anybody else has any ideas of how this should be structured. Connections are fairly simple. They contain: fromRecordID, fromTableID, fromRecordOrder, toRecordID, toTableID, toRecordOrder. Here's what I've come up with so far: Store all the connections in one big table. If I do this, either I load all connections at once (one big db call) or make a call every time a user table is loaded. The big issue here: the size of the connections table has the potential to be huge, and I'm afraid it would slow things down. Store in separate tables all the outgoing connections for each user table. This is probably the worst idea I've had. Now my connections are 'spread out' over multiple tables (one for each user table), which means I have to make a separate DB called to each table (or make a huge join) just to find all the incoming connections for a particular user table. I've avoided making "one big ass table", but I'm not sure the cost is worth it. Store in separate tables all outgoing AND incoming connections for each user table (using a flag to distinguish between incoming vs outgoing). This is the idea I'm leaning towards, but it will essentially double the total DB storage for all the connections (as each connection will be stored in two tables). It also means I have to make sure connection information is kept in sync in both places. This is obviously not ideal but it does mean that when I load a user table, I only need to load one 'connection' table and have all the information I need. This also presents a separate problem, that of connection object creation. Since each user table has a list of all connections, there are two opportunities for a connection object to be made. However, connections objects (designed to facilitate communication between records) should only be created once. This means I'll have to devise a common caching/factory object to make sure only one connection object is made per connection. Does anybody have any ideas of a better way to do this? Once I've committed to a particular design pattern I'm pretty much stuck with it, so I want to make sure I've come up with the best one possible.

    Read the article

  • Big Data – Interacting with Hadoop – What is Sqoop? – What is Zookeeper? – Day 17 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Pig and Pig Latin in Big Data Story. In this article we will understand what is Sqoop and Zookeeper in Big Data Story. There are two most important components one should learn when learning about interacting with Hadoop – Sqoop and Zookper. What is Sqoop? Most of the business stores their data in RDBMS as well as other data warehouse solutions. They need a way to move data to the Hadoop system to do various processing and return it back to RDBMS from Hadoop system. The data movement can happen in real time or at various intervals in bulk. We need a tool which can help us move this data from SQL to Hadoop and from Hadoop to SQL. Sqoop (SQL to Hadoop) is such a tool which extract data from non-Hadoop data sources and transform them into the format which Hadoop can use it and later it loads them into HDFS. Essentially it is ETL tool where it Extracts, Transform and Load from SQL to Hadoop. The best part is that it also does extract data from Hadoop and loads them to Non-SQL (or RDBMS) data stores. Essentially, Sqoop is a command line tool which does SQL to Hadoop and Hadoop to SQL. It is a command line interpreter. It creates MapReduce job behinds the scene to import data from an external database to HDFS. It is very effective and easy to learn tool for nonprogrammers. What is Zookeeper? ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. In other words Zookeeper is a replicated synchronization service with eventual consistency. In simpler words – in Hadoop cluster there are many different nodes and one node is master. Let us assume that master node fails due to any reason. In this case, the role of the master node has to be transferred to a different node. The main role of the master node is managing the writers as that task requires persistence in order of writing. In this kind of scenario Zookeeper will assign new master node and make sure that Hadoop cluster performs without any glitch. Zookeeper is the Hadoop’s method of coordinating all the elements of these distributed systems. Here are few of the tasks which Zookeepr is responsible for. Zookeeper manages the entire workflow of starting and stopping various nodes in the Hadoop’s cluster. In Hadoop cluster when any processes need certain configuration to complete the task. Zookeeper makes sure that certain node gets necessary configuration consistently. In case of the master node fails, Zookeepr can assign new master node and make sure cluster works as expected. There many other tasks Zookeeper performance when it is about Hadoop cluster and communication. Basically without the help of Zookeeper it is not possible to design any new fault tolerant distributed application. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Big Data Analytics. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • SQL SERVER – Unable to DELETE Project in Data Quality Projects (DQS)

    - by pinaldave
    Here is the email which made me write this blog post. When I write a blog post I write keeping in mind that if the developer is not familiar with the concept he will attempt this on the development server. If due to any reason you attempt it on any other server than your personal server, developer should make sure to have complete confidence on his own expertise and understand the risk behind it.  Well, let us read the email which I received. I have modified it a bit to remove information related to organizational and individual. “I just read your blog post on Beginning DQS. I went ahead and followed every single screenshot and it worked fine. I was able to execute the DQS project successfully. However, the same blog post got me in trouble – a serious trouble. After first successful deployment I went ahead and created a few of my own knowledge base and projects. I played around a bit and then decided to get back to real work. Now we had deployed DQS on production server only, so experiment on production server. Now, when I got back to my work, I forgot to close all the windows. My manager found the window open and have seen my test projects. He has asked me to delete my experiments immediately and have said words which I cannot write to you. Here is the problem. I am not able to delete the project which I have created earlier. I am able to open it and play with it but the delete option is disabled and grayed out (see attached image). Now I believe there is nothing wrong with this project as it was just a test project. Would you please write to my manager that it is not harmful to leave that project there as it is? It is also not using any resources. I think he will believe you.” As I said this kind of email makes me uncomfortable. I do not want someone to execute anything on production server. I often write notes and disclaimer on my post when something is dangerous to execute on production server. However, if someone is not expert with SQL Server and attempts something new on production server, I think the major issue is here with the person (admin) who gave new developer permission to production server. This has to be carefully avoided. Here was my response to the individual. “I cannot write to your manager anything as he has not asked me anything. Honestly I believe he is correct in his behavior as you should have not executed anything on the production server without prior approval and testing on the development server. Any R&D must be done on local box or development box. I suggest you request your manager to prevent access to users who does not need access. If he is a good manager, he might have already implemented by now recent event. I also see your screenshot. Here is the issue: While you were playing with project, you might have closed the project half the way, without completing it. Due to the same reason it is locked. You can open and continue from the same place where you have left the project. If you do not need the project any more. Right click on it, click on unlock the project. This will enable the DELETE option and now you can delete the project. Next time, be safe out there. It may be dangerous to have admin access to production server when not needed.“ I have yet not heard from him but I believe he will take my words positively. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Data Quality Services, DQS

    Read the article

  • Best strategy for synching data in iPhone app

    - by iamj4de
    I am working on a regular iPhone app which pulls data from a server (XML, JSON, etc...), and I'm wondering what is the best way to implement synching data. Criteria are speed (less network data exchange), robustness (data recovery in case update fails), offline access and flexibility (adaptable when the structure of the database changes slightly, like a new column). I know it varies from app to app, but can you guys share some of your strategy/experience? For me, I'm thinking of something like this: 1) Store Last Modified Date in iPhone 2) Upon launching, send a message like getNewData.php?lastModifiedDate=... 3) Server will process and send back only modified data from last time. 4) This data is formatted as so: <+><data id="..."></data></+> // add this to SQLite/CoreData <-><data id="..."></data></-> // remove this <%><data id="..."><attribute>newValue</attribute></data></%> // new modified value I don't want to make <+, <-, <%... for each attribute as well, because it would be too complicated, so probably when receive a <% field, I would just remove the data with the specified id and then add it again (assuming id here is not some automatically auto-incremented field). 5) Once everything is downloaded and updated, I will update the Last Modified Date field. The main problem with this strategy is: If the network goes down when I am updating something = the Last Modified Date is not yet updated = next time I relaunch the app, I will have to go through the same thing again. Not to mention potential inconsistent data. If I use a temporary table for update and make the whole thing atomic, it would work, but then again, if the update is too long (lots of data change), the user has to wait a long time until new data is available. Should I use Last-Modified-Date for each of the data field and update data gradually?

    Read the article

  • Using list() to extract a data.table inside of a function

    - by Nathan VanHoudnos
    I must admit that the data.table J syntax confuses me. I am attempting to use list() to extract a subset of a data.table as a data.table object as described in Section 1.4 of the data.table FAQ, but I can't get this behavior to work inside of a function. An example: require(data.table) ## Setup some test data set.seed(1) test.data <- data.table( X = rnorm(10), Y = rnorm(10), Z = rnorm(10) ) setkey(test.data, X) ## Notice that I can subset the data table easily with literal names test.data[, list(X,Y)] ## X Y ## 1: -0.8356286 -0.62124058 ## 2: -0.8204684 -0.04493361 ## 3: -0.6264538 1.51178117 ## 4: -0.3053884 0.59390132 ## 5: 0.1836433 0.38984324 ## 6: 0.3295078 1.12493092 ## 7: 0.4874291 -0.01619026 ## 8: 0.5757814 0.82122120 ## 9: 0.7383247 0.94383621 ## 10: 1.5952808 -2.21469989 I can even write a function that will return a column of the data.table as a vector when passed the name of a column as a character vector: get.a.vector <- function( my.dt, my.column ) { ## Step 1: Convert my.column to an expression column.exp <- parse(text=my.column) ## Step 2: Return the vector return( my.dt[, eval(column.exp)] ) } get.a.vector( test.data, 'X') ## [1] -0.8356286 -0.8204684 -0.6264538 -0.3053884 0.1836433 0.3295078 ## [7] 0.4874291 0.5757814 0.7383247 1.5952808 But I cannot pull a similar trick for list(). The inline comments are the output from the interactive browser() session. get.a.dt <- function( my.dt, my.column ) { ## Step 1: Convert my.column to an expression column.exp <- parse(text=my.column) ## Step 2: Enter the browser to play around browser() ## Step 3: Verity that a literal X works: my.dt[, list(X)] ## << not shown >> ## Step 4: Attempt to evaluate the parsed experssion my.dt[, list( eval(column.exp)] ## Error in `rownames<-`(`*tmp*`, value = paste(format(rn, right = TRUE), (from data.table.example.R@1032mCJ#7) : ## length of 'dimnames' [1] not equal to array extent return( my.dt[, list(eval(column.exp))] ) } get.a.dt( test.data, "X" ) What am I missing? Update: Due to some confusion as to why I would want to do this I wanted to clarify. My use case is when I need to access a data.table column when when I generate the name. Something like this: set.seed(2) test.data[, X.1 := rnorm(10)] which.column <- 'X' new.column <- paste(which.column, '.1', sep="") get.a.dt( test.data, new.column ) Hopefully that helps.

    Read the article

  • jQuery Templates and Data Linking (and Microsoft contributing to jQuery)

    - by ScottGu
    The jQuery library has a passionate community of developers, and it is now the most widely used JavaScript library on the web today. Two years ago I announced that Microsoft would begin offering product support for jQuery, and that we’d be including it in new versions of Visual Studio going forward. By default, when you create new ASP.NET Web Forms and ASP.NET MVC projects with VS 2010 you’ll find jQuery automatically added to your project. A few weeks ago during my second keynote at the MIX 2010 conference I announced that Microsoft would also begin contributing to the jQuery project.  During the talk, John Resig -- the creator of the jQuery library and leader of the jQuery developer team – talked a little about our participation and discussed an early prototype of a new client templating API for jQuery. In this blog post, I’m going to talk a little about how my team is starting to contribute to the jQuery project, and discuss some of the specific features that we are working on such as client-side templating and data linking (data-binding). Contributing to jQuery jQuery has a fantastic developer community, and a very open way to propose suggestions and make contributions.  Microsoft is following the same process to contribute to jQuery as any other member of the community. As an example, when working with the jQuery community to improve support for templating to jQuery my team followed the following steps: We created a proposal for templating and posted the proposal to the jQuery developer forum (http://forum.jquery.com/topic/jquery-templates-proposal and http://forum.jquery.com/topic/templating-syntax ). After receiving feedback on the forums, the jQuery team created a prototype for templating and posted the prototype at the Github code repository (http://github.com/jquery/jquery-tmpl ). We iterated on the prototype, creating a new fork on Github of the templating prototype, to suggest design improvements. Several other members of the community also provided design feedback by forking the templating code. There has been an amazing amount of participation by the jQuery community in response to the original templating proposal (over 100 posts in the jQuery forum), and the design of the templating proposal has evolved significantly based on community feedback. The jQuery team is the ultimate determiner on what happens with the templating proposal – they might include it in jQuery core, or make it an official plugin, or reject it entirely.  My team is excited to be able to participate in the open source process, and make suggestions and contributions the same way as any other member of the community. jQuery Template Support Client-side templates enable jQuery developers to easily generate and render HTML UI on the client.  Templates support a simple syntax that enables either developers or designers to declaratively specify the HTML they want to generate.  Developers can then programmatically invoke the templates on the client, and pass JavaScript objects to them to make the content rendered completely data driven.  These JavaScript objects can optionally be based on data retrieved from a server. Because the jQuery templating proposal is still evolving in response to community feedback, the final version might look very different than the version below. This blog post gives you a sense of how you can try out and use templating as it exists today (you can download the prototype by the jQuery core team at http://github.com/jquery/jquery-tmpl or the latest submission from my team at http://github.com/nje/jquery-tmpl).  jQuery Client Templates You create client-side jQuery templates by embedding content within a <script type="text/html"> tag.  For example, the HTML below contains a <div> template container, as well as a client-side jQuery “contactTemplate” template (within the <script type="text/html"> element) that can be used to dynamically display a list of contacts: The {{= name }} and {{= phone }} expressions are used within the contact template above to display the names and phone numbers of “contact” objects passed to the template. We can use the template to display either an array of JavaScript objects or a single object. The JavaScript code below demonstrates how you can render a JavaScript array of “contact” object using the above template. The render() method renders the data into a string and appends the string to the “contactContainer” DIV element: When the page is loaded, the list of contacts is rendered by the template.  All of this template rendering is happening on the client-side within the browser:   Templating Commands and Conditional Display Logic The current templating proposal supports a small set of template commands - including if, else, and each statements. The number of template commands was deliberately kept small to encourage people to place more complicated logic outside of their templates. Even this small set of template commands is very useful though. Imagine, for example, that each contact can have zero or more phone numbers. The contacts could be represented by the JavaScript array below: The template below demonstrates how you can use the if and each template commands to conditionally display and loop the phone numbers for each contact: If a contact has one or more phone numbers then each of the phone numbers is displayed by iterating through the phone numbers with the each template command: The jQuery team designed the template commands so that they are extensible. If you have a need for a new template command then you can easily add new template commands to the default set of commands. Support for Client Data-Linking The ASP.NET team recently submitted another proposal and prototype to the jQuery forums (http://forum.jquery.com/topic/proposal-for-adding-data-linking-to-jquery). This proposal describes a new feature named data linking. Data Linking enables you to link a property of one object to a property of another object - so that when one property changes the other property changes.  Data linking enables you to easily keep your UI and data objects synchronized within a page. If you are familiar with the concept of data-binding then you will be familiar with data linking (in the proposal, we call the feature data linking because jQuery already includes a bind() method that has nothing to do with data-binding). Imagine, for example, that you have a page with the following HTML <input> elements: The following JavaScript code links the two INPUT elements above to the properties of a JavaScript “contact” object that has a “name” and “phone” property: When you execute this code, the value of the first INPUT element (#name) is set to the value of the contact name property, and the value of the second INPUT element (#phone) is set to the value of the contact phone property. The properties of the contact object and the properties of the INPUT elements are also linked – so that changes to one are also reflected in the other. Because the contact object is linked to the INPUT element, when you request the page, the values of the contact properties are displayed: More interesting, the values of the linked INPUT elements will change automatically whenever you update the properties of the contact object they are linked to. For example, we could programmatically modify the properties of the “contact” object using the jQuery attr() method like below: Because our two INPUT elements are linked to the “contact” object, the INPUT element values will be updated automatically (without us having to write any code to modify the UI elements): Note that we updated the contact object above using the jQuery attr() method. In order for data linking to work, you must use jQuery methods to modify the property values. Two Way Linking The linkBoth() method enables two-way data linking. The contact object and INPUT elements are linked in both directions. When you modify the value of the INPUT element, the contact object is also updated automatically. For example, the following code adds a client-side JavaScript click handler to an HTML button element. When you click the button, the property values of the contact object are displayed using an alert() dialog: The following demonstrates what happens when you change the value of the Name INPUT element and click the Save button. Notice that the name property of the “contact” object that the INPUT element was linked to was updated automatically: The above example is obviously trivially simple.  Instead of displaying the new values of the contact object with a JavaScript alert, you can imagine instead calling a web-service to save the object to a database. The benefit of data linking is that it enables you to focus on your data and frees you from the mechanics of keeping your UI and data in sync. Converters The current data linking proposal also supports a feature called converters. A converter enables you to easily convert the value of a property during data linking. For example, imagine that you want to represent phone numbers in a standard way with the “contact” object phone property. In particular, you don’t want to include special characters such as ()- in the phone number - instead you only want digits and nothing else. In that case, you can wire-up a converter to convert the value of an INPUT element into this format using the code below: Notice above how a converter function is being passed to the linkFrom() method used to link the phone property of the “contact” object with the value of the phone INPUT element. This convertor function strips any non-numeric characters from the INPUT element before updating the phone property.  Now, if you enter the phone number (206) 555-9999 into the phone input field then the value 2065559999 is assigned to the phone property of the contact object: You can also use a converter in the opposite direction also. For example, you can apply a standard phone format string when displaying a phone number from a phone property. Combining Templating and Data Linking Our goal in submitting these two proposals for templating and data linking is to make it easier to work with data when building websites and applications with jQuery. Templating makes it easier to display a list of database records retrieved from a database through an Ajax call. Data linking makes it easier to keep the data and user interface in sync for update scenarios. Currently, we are working on an extension of the data linking proposal to support declarative data linking. We want to make it easy to take advantage of data linking when using a template to display data. For example, imagine that you are using the following template to display an array of product objects: Notice the {{link name}} and {{link price}} expressions. These expressions enable declarative data linking between the SPAN elements and properties of the product objects. The current jQuery templating prototype supports extending its syntax with custom template commands. In this case, we are extending the default templating syntax with a custom template command named “link”. The benefit of using data linking with the above template is that the SPAN elements will be automatically updated whenever the underlying “product” data is updated.  Declarative data linking also makes it easier to create edit and insert forms. For example, you could create a form for editing a product by using declarative data linking like this: Whenever you change the value of the INPUT elements in a template that uses declarative data linking, the underlying JavaScript data object is automatically updated. Instead of needing to write code to scrape the HTML form to get updated values, you can instead work with the underlying data directly – making your client-side code much cleaner and simpler. Downloading Working Code Examples of the Above Scenarios You can download this .zip file to get with working code examples of the above scenarios.  The .zip file includes 4 static HTML page: Listing1_Templating.htm – Illustrates basic templating. Listing2_TemplatingConditionals.htm – Illustrates templating with the use of the if and each template commands. Listing3_DataLinking.htm – Illustrates data linking. Listing4_Converters.htm – Illustrates using a converter with data linking. You can un-zip the file to the file-system and then run each page to see the concepts in action. Summary We are excited to be able to begin participating within the open-source jQuery project.  We’ve received lots of encouraging feedback in response to our first two proposals, and we will continue to actively contribute going forward.  These features will hopefully make it easier for all developers (including ASP.NET developers) to build great Ajax applications. Hope this helps, Scott P.S. [In addition to blogging, I am also now using Twitter for quick updates and to share links. Follow me at: twitter.com/scottgu]

    Read the article

  • Data Model Evolution

    - by redleafong
    Hey guys, When writing code I am seeing requirements to change data models (e.g. adding/changing/removing data members from a class). When these data models belong to an interface, it seems difficult to change without breaking the existing client codes. So I am wondering if there is any best practice of designing interfaces/data models in a way to minimize the impact during evolution. The closest thing I can find from google is data contract versioning. But that seems to be a .net specific topic. I am wondering if the same practice applies to the Java world, or there is a different or generic way to deal with data model evolution. Thanks

    Read the article

  • Best practices when creating/modeling databases?

    - by Oscar Mederos
    I learned at the University some steps to model a database: Model the problem using the Extended Entity-Relationship Model. Extract the functional dependencies Apply some algorithms to normalize the database (3NF or Boyce-Codd) Create the database I'm studying Computer Science and since I received that course I'm wondering if I always need to do those steps when creating a complex database for an specified problem. For example, do PHP / .NET / .. programmers always do that? or there are some tools to simplify that process, maybe using another way of represent the problem instead of the EERM?

    Read the article

< Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26  | Next Page >