Search Results

Search found 837 results on 34 pages for 'structured'.

Page 6/34 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >

  • Unlocking Productivity

    - by Michael Snow
    Unlocking Productivity in Life Sciences with Consolidated Content Management by Joe Golemba, Vice President, Product Management, Oracle WebCenter As life sciences organizations look to become more operationally efficient, the ability to effectively leverage information is a competitive advantage. Whether data mining at the drug discovery phase or prepping the sales team before a product launch, content management can play a key role in developing, organizing, and disseminating vital information. The goal of content management is relatively straightforward: put the information that people need where they can find it. A number of issues can complicate this; information sits in many different systems, each of those systems has its own security, and the information in those systems exists in many different formats. Identifying and extracting pertinent information from mountains of farflung data is no simple job, but the alternative—wasted effort or even regulatory compliance issues—is worse. An integrated information architecture can enable health sciences organizations to make better decisions, accelerate clinical operations, and be more competitive. Unstructured data matters Often when we think of drug development data, we think of structured data that fits neatly into one or more research databases. But structured data is often directly supported by unstructured data such as experimental protocols, reaction conditions, lot numbers, run times, analyses, and research notes. As life sciences companies seek integrated views of data, they are typically finding diverse islands of data that seemingly have no relationship to other data in the organization. Information like sales reports or call center reports can be locked into siloed systems, and unavailable to the discovery process. Additionally, in the increasingly networked clinical environment, Web pages, instant messages, videos, scientific imaging, sales and marketing data, collaborative workspaces, and predictive modeling data are likely to be present within an organization, and each source potentially possesses information that can help to better inform specific efforts. Historically, content management solutions that had 21CFR Part 11 capabilities—electronic records and signatures—were focused mainly on content-enabling manufacturing-related processes. Today, life sciences companies have many standalone repositories, requiring different skills, service level agreements, and vendor support costs to manage them. With the amount of content doubling every three to six months, companies have recognized the need to manage unstructured content from the beginning, in order to increase employee productivity and operational efficiency. Using scalable and secure enterprise content management (ECM) solutions, organizations can better manage their unstructured content. These solutions can also be integrated with enterprise resource planning (ERP) systems or research systems, making content available immediately, in the context of the application and within the flow of the employee’s typical business activity. Administrative safeguards—such as content de-duplication—can also be applied within ECM systems, so documents are never recreated, eliminating redundant efforts, ensuring one source of truth, and maintaining content standards in the organization. Putting it in context Consolidating structured and unstructured information in a single system can greatly simplify access to relevant information when it is needed through contextual search. Using contextual filters, results can include therapeutic area, position in the value chain, semantic commonalities, technology-specific factors, specific researchers involved, or potential business impact. The use of taxonomies is essential to organizing information and enabling contextual searches. Taxonomy solutions are composed of a hierarchical tree that defines the relationship between different life science terms. When overlaid with additional indexing related to research and/or business processes, it becomes possible to effectively narrow down the amount of data that is returned during searches, as well as prioritize results based on specific criteria and/or prior search history. Thus, search results are more accurate and relevant to an employee’s day-to-day work. For example, a search for the word "tissue" by a lab researcher would return significantly different results than a search for the same word performed by someone in procurement. Of course, diverse data repositories, combined with the immense amounts of data present in an organization, necessitate that the data elements be regularly indexed and cached beforehand to enable reasonable search response times. In its simplest form, indexing of a single, consolidated data warehouse can be expected to be a relatively straightforward effort. However, organizations require the ability to index multiple data repositories, enabling a single search to reference multiple data sources and provide an integrated results listing. Security and compliance Beyond yielding efficiencies and supporting new insight, an enterprise search environment can support important security considerations as well as compliance initiatives. For example, the systems enable organizations to retain the relevance and the security of the indexed systems, so users can only see the results to which they are granted access. This is especially important as life sciences companies are working in an increasingly networked environment and need to provide secure, role-based access to information across multiple partners. Although not officially required by the 21 CFR Part 11 regulation, the U.S. Food and Drug Administraiton has begun to extend the type of content considered when performing relevant audits and discoveries. Having an ECM infrastructure that provides centralized management of all content enterprise-wide—with the ability to consistently apply records and retention policies along with the appropriate controls, validations, audit trails, and electronic signatures—is becoming increasingly critical for life sciences companies. Making the move Creating an enterprise-wide ECM environment requires moving large amounts of content into a single enterprise repository, a daunting and risk-laden initiative. The first key is to focus on data taxonomy, allowing content to be mapped across systems. The second is to take advantage new tools which can dramatically speed and reduce the cost of the data migration process through automation. Additional content need not be frozen while it is migrated, enabling productivity throughout the process. The ability to effectively leverage information into success has been gaining importance in the life sciences industry for years. The rapid adoption of enterprise content management, both in operational processes as well as in scientific management, are clear indicators that the companies are looking to use all available data to be better informed, improve decision making, minimize risk, and increase time to market, to maintain profitability and be more competitive. As more and more varieties and sources of information are brought under the strategic management umbrella, the ability to divine knowledge from the vast pool of information is increasingly difficult. Simple search engines and basic content management are increasingly unable to effectively extract the right information from the mountains of data available. By bringing these tools into context and integrating them with business processes and applications, we can effectively focus on the right decisions that make our organizations more profitable. More Information Oracle will be exhibiting at DIA 2012 in Philadelphia on June 25-27. Stop by our booth Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} (#2825) to learn more about the advantages of a centralized ECM strategy and see the Oracle WebCenter Content solution, our 21 CFR Part 11 compliant content management platform.

    Read the article

  • Webmasters hentry error and authorless pages

    - by Ben Racicot
    Within Google Webmasters Search Appearance-Structured data I'm getting a series of errors: Error: Missing required hCard "author". And most of my 44 errors have: Missing: Author Missing: entry-title Missing: updated There seems to be no CLEAR explanation of these errors. It is either because these classes exist without their nested classes, or they are expected to exist because of something else, possibly itemscope or itemtype='' The Question: How do you specify with richsnippets that the page is about a location and there is no human author?

    Read the article

  • recommended book for cocos2d?

    - by Paul Sanwald
    I'm an experienced programmer that recently got into iOS development by working through the big nerd ranch book by Aaron Hillegass and Joe Conway. I loved the way the book was structured in terms of typing in the code and doing the challenges. I'm interested in learning more about iOS gaming and cocos2d, but am a complete newbie in terms of game development/design. there are a number of books on amazon on cocos2d, can anyone recommend one in particular?

    Read the article

  • Creating Multiple Queries for Running Objects

    - by edurdias
    Running Objects combines the power of LINQ with Metadata definition to let you leverage multiples perspectives of your queries of objects. By default, RO brings all the objects in natural order of insertion and including all the visible properties of your class. In this post, we will understand how the QueryAttribute class is structured and how to make use of it. The QueryAttribute class This class is the responsible to specify all the possible perspectives of a list of objects. In other words, is...(read more)

    Read the article

  • OTN Virtual Technology Summit - July 9 - Middleware Track

    - by OTN ArchBeat
    The Architecture of Analytics: Big Time Big Data and Business Intelligence This four-session track, part of the free OTN Virtual Technology Summit on July 9, will present a solution architect's perspective on how business intelligence products in Oracle's Fusion Middleware family and beyond fit into an effective big data architecture, offering insight and expertise from Oracle ACE Directors and product team experts specializing in business Intelligence to help you meet your big data business intelligence challenges. Register now! Sessions Oracle Big Data Appliance Case Study: Using Big Data to Analyze Cancer-Genome Relationships Tom Plunkett, Lead Author of the Oracle Big Data Handbook What does it take to build an award winning Big Data solution? This presentation takes a deep technical dive into the use of the Oracle Big Data Appliance in a project for the National Cancer Institute's Frederick National Laboratory for Cancer Research. The Frederick National Laboratory and the Oracle team won several awards for analyzing relationships between genomes and cancer subtypes with big data, including the 2012 Government Big Data Solutions Award, the 2013 Excellence.Gov Finalist for Innovation, and the 2013 ComputerWorld Honors Laureate for Innovation. [30 mins] Getting Value from Big Data Variety Richard Tomlinson, Director, Product Management, Oracle Big data variety implies big data complexity. Performing analytics on diverse data typically involves mashing up structured, semi-structured and unstructured content. So how can we do this effectively to get real value? How do we relate diverse content so we can start to analyze it? This session looks at how we approach this tricky problem using Endeca Information Discovery. [30 mins] How To Leverage Your Investment In Oracle Business Intelligence Enterprise Edition Within a Big Data Architecture Oracle ACE Director Kevin McGinley More and more organizations are realizing the value Big Data technologies contribute to the return on investment in Analytics. But as an increasing variety of data types reside in different data stores, organizations are finding that a unified Analytics layer can help bridge the divide in modern data architectures. This session will examine how you can enable Oracle Business Intelligence Enterprise Edition (OBIEE) to play a role in a unified Analytics layer and the benefits and use cases for doing so. [30 mins] Oracle Data Integrator 12c As Your Big Data Data Integration Hub Oracle ACE Director Mark Rittman Oracle Data Integrator 12c (ODI12c), as well as being able to integrate and transform data from application and database data sources, also has the ability to load, transform and orchestrate data loads to and from Big Data sources. In this session, we'll look at ODI12c's ability to load data from Hadoop, Hive, NoSQL and file sources, transform that data using Hive and MapReduce processing across the Hadoop cluster, and then bulk-load that data into an Oracle Data Warehouse using Oracle Big Data Connectors. We will also look at how ODI12c enables ETL-offloading to a Hadoop cluster, with some tips and techniques on real-time capture into a Hadoop data reservoir and techniques and limitations when performing ETL on big data sources. [90 mins] Register now!

    Read the article

  • How would one build a relational database on a key-value store, a-la Berkeley DB's SQL interface?

    - by coleifer
    I've been checking out Berkeley DB and was impressed to find that it supported a SQL interface that is "nearly identical" to SQLite. http://docs.oracle.com/cd/E17076_02/html/bdb-sql/dbsqlbasics.html#identicalusage I'm very curious, at a high-level, how this kind of interface might have been architected. For instance: since values are "transparent", how do you efficiently query and sort by value how are limits and offsets performed efficiently on large result sets how would the keys be structured and serialized for good average-case performance

    Read the article

  • SEO Services - Benefits of SEO Outsourcing For Small Businesses

    If you are an entrepreneur you must be accustomed to do business in a very structured manner. Usually, the chances are more that you must be managing almost every business processes all by yourself. Now at a time like this if you plan to swap over to the Internet marketing world, you will soon realize that you actually require functioning a bit different from your competitors to be more successful.

    Read the article

  • Natural SEO Vs Pay Per Click SEO

    A question that is asked a lot by many clients is what is best, natural search engine optimisation or pay per click? This is always a tricky question to answer because the two work so well together as a structured campaign but at the same time, are totally different. By identifying the two you will see what suits your needs and your campaign.

    Read the article

  • iOS app with a lot of text

    - by rdurand
    I just asked a question on StackOverflow, but I'm thinking that a part of it belongs here, as questions about design pattern are welcomed by the faq. Here is my situation. I have developed almost completely a native iOS app. The last section I need to implement is all the rules of a sport, so that's a lot of text. It has one main level of sections, divided in subsections, containing a lot of structured text (paragraphs, a few pictures, bulleted/numbered lists, tables). I have absolutely no problem with coding, I'm just looking for advice to improve and make the best design pattern possible for my app. My first shot (the last one so far) was a UITableViewController containing the sections, sending the user to another UITableViewController with the subsections of the selected section, and then one strange last UITableViewController where the cells contain UITextViews, sections header help structure the content, etc. What I would like is your advice on how to improve the structure of this section. I'm perfectly ready to destroy/rebuild the whole thing, I'm really lost in my design here.. As I said on SO, I've began to implement a UIWebView in a UIViewController, showing a html page with JQuery Mobile to display the content, and it's fine. My question is more about the 2 views taking the user to that content. I used UITableViewControllers because that's what seemed the most appropriate for a structured hierarchy like this one. But that doesn't seem like the best solution in term of user experience.. What structure / "view-flow" / kind of presentation would you try to implement in my situation? As always, any help would be greatly appreciated! Just so you can understand better the hierarchy, with a simple example : -----> Section 1 -----> SubSection 1.1 -----> Content | -----> SubSection 1.2 -----> Content | -----> SubSection 1.3 -----> Content | | | UINavigationController -------> Section 2 -----> SubSection 2.1 -----> Content | -----> SubSection 2.2 -----> Content | -----> SubSection 2.3 -----> Content | -----> SubSection 2.4 -----> Content | -----> SubSection 2.5 -----> Content | -----> Section 3 -----> SubSection 3.1 -----> Content -----> SubSection 3.2 -----> Content |------------------| |--------------------| |-------------| 1 UITableViewController 3 UITableViewControllers 10 UIViewControllers (3 rows) (with different with a UIWebView number of rows)

    Read the article

  • What are some known approaches to collaborative schema design?

    - by Omega
    If a project has multiple developers, each with useful knowledge & experience that can aide in the design of a schema; what are some known processes to collaboratively plan that schema out? Are there any types of meetings that are useful for this purpose? This would be in contrast to circumstances where projects are started and models are developed unilaterally by coincidence rather than as part of a structured understanding of the domain.

    Read the article

  • 5 Ways to Link Building Success

    If link building isn't really your daily cup of tea, don't worry because there is a way where you can get your link structured site up in just a jiffy. It isn't always easy but with the 5 helpful steps you'll be building in no time. People often tell themselves that is just a piece of cake.

    Read the article

  • Formal separation marker of syslog events?

    - by Server Horror
    I've been looking at RFC5424 to find the formally specified marker that will end a syslog event. Unfortunately I couldn't find it. So If I wanted to implement some small syslog server that reacts on certain messages what is the marker that ends a message (yes commonly an event is a single line, but I just couldn't find it in the specification) Clarification: I call it event because I associate a message with a single line. An event could possibly be some thing like Type: foo Source: webservers whereas a message to me is this: Type: foo Source: webservers http://tools.ietf.org/html/rfc5424#section-6 defines: SYSLOG-MSG = HEADER SP STRUCTURED-DATA [SP MSG] neither STRUCTURED-DATA nor MSG tell me how these fields end. Especially MSG is defined as as MSG-ANY / MSG-UTF8 which expands to virtually anything. There's nothing that says a newline marks the end (or an 8 or an a for that matter). Given the example messages (section 6.5): This is one valid message, or 2 valid messages depending on wether you say that a HEADER element must never occur in any MSG element: literal whitespace <34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47 - <34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47 | is this an end marker? \t stands for a tab <34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47 -\t<34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47 | is this an end marker? \n stands for a newline <34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47 -\n<34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47 | is this an end marker? Either I'm misreading the RFC or there just isn't any mention. The sizes specified in the RFC just say what the minimum length is expected that I can work with...

    Read the article

  • A format for storing personal contacts in a database

    - by Gart
    I'm thinking of the best way to store personal contacts in a database for a business application. The traditional and straightforward approach would be to create a table with columns for each element, i.e. Name, Telephone Number, Job title, Address, etc... However, there are known industry standards for this kind of data, like for example vCard, or hCard, or vCard-RDF/XML or even Windows Contacts XML Schema. Utilizing an standard format would offer some benefits, like inter-operablilty with other systems. But how can I decide which method to use? The requirements are mainly to store the data. Search and ordering queries are highly unlikely but possible. The volume of the data is 100,000 records at maximum. My database engine supports native XML columns. I have been thinking to use some XML-based format to store the personal contacts. Then it will be possible to utilize XML indexes on this data, if searching and ordering is needed. Is this a good approach? Which contacts format and schema would you recommend for this? Edited after first answers Here is why I think the straightforward approach is bad. This is due to the nature of this kind of data - it is not that simple. The personal contacts it is not well-structured data, it may be called semi-structured. Each contact may have different data fields, maybe even such fields which I cannot anticipate. In my opinion, each piece of this data should be treated as important information, i.e. no piece of data could be discarded just because there was no relevant column in the database. If we took it further, assuming that no data may be lost, then we could create a big text column named Comment or Description or Other and put there everything which cannot be fitted well into table columns. But then again - the data would lose structure - this might be bad. If we wanted structured data then - according to the database design principles - the data should be decomposed into entities, and relations should be established between the entities. But this adds complexity - there are just too many entities, and lots of design desicions should be made, like "How do we store Address? Personal Name? Phone number? How do we encode home phone numbers and mobile phone numbers? How about other contact info?.." The relations between entities are complex and multiple, and each relation is a table in the database. Each relation needs to be documented in the design papers. That is a lot of work to do. But it is possible to avoid the complexity entirely - just document that the data is stored according to such and such standard schema, period. Then anybody who would be reading that document should easily understand what it was all about. Finally, this is all about using an industry standard. The standard is, hopefully, designed by some clever people who anticipated and described the structure of personal contacts information much better than I ever could. Why should we all reinvent the wheel?? It's much easier to use a standard schema. The problem is, there are just too many standards - it's not easy to decide which one to use!

    Read the article

  • Which programming idiom to choose for this open source library?

    - by Walkman
    I have an interesting question about which programming idiom is easier to use for beginner developers writing concrete file parsing classes. I'm developing an open source library, which one of the main functionality is to parse plain text files and get structured information from them. All of the files contains the same kind of information, but can be in different formats like XML, plain text (each of them is structured differently), etc. There are a common set of information pieces which is the same in all (e.g. player names, table names, some id numbers) There are formats which are very similar to each other, so it's possible to define a common Base class for them to facilitate concrete format parser implementations. So I can clearly define base classes like SplittablePlainTextFormat, XMLFormat, SeparateSummaryFormat, etc. Each of them hints the kind of structure they aim to parse. All of the concrete classes should have the same information pieces, no matter what. To be useful at all, this library needs to define at least 30-40 of these parsers. A couple of them are more important than others (obviously the more popular formats). Now my question is, which is the best programming idiom to choose to facilitate the development of these concrete classes? Let me explain: I think imperative programming is easy to follow even for beginners, because the flow is fixed, the statements just come one after another. Right now, I have this: class SplittableBaseFormat: def parse(self): "Parses the body of the hand history, but first parse header if not yet parsed." if not self.header_parsed: self.parse_header() self._parse_table() self._parse_players() self._parse_button() self._parse_hero() self._parse_preflop() self._parse_street('flop') self._parse_street('turn') self._parse_street('river') self._parse_showdown() self._parse_pot() self._parse_board() self._parse_winners() self._parse_extra() self.parsed = True So the concrete parser need to define these methods in order in any way they want. Easy to follow, but takes longer to implement each individual concrete parser. So what about declarative? In this case Base classes (like SplittableFormat and XMLFormat) would do the heavy lifting based on regex and line/node number declarations in the concrete class, and concrete classes have no code at all, just line numbers and regexes, maybe other kind of rules. Like this: class SplittableFormat: def parse_table(): "Parses TABLE_REGEX and get information" # set attributes here def parse_players(): "parses PLAYER_REGEX and get information" # set attributes here class SpecificFormat1(SplittableFormat): TABLE_REGEX = re.compile('^(?P<table_name>.*) other info \d* etc') TABLE_LINE = 1 PLAYER_REGEX = re.compile('^Player \d: (?P<player_name>.*) has (.*) in chips.') PLAYER_LINE = 16 class SpecificFormat2(SplittableFormat): TABLE_REGEX = re.compile(r'^Tournament #(\d*) (?P<table_name>.*) other info2 \d* etc') TABLE_LINE = 2 PLAYER_REGEX = re.compile(r'^Seat \d: (?P<player_name>.*) has a stack of (\d*)') PLAYER_LINE = 14 So if I want to make it possible for non-developers to write these classes the way to go seems to be the declarative way, however, I'm almost certain I can't eliminate the declarations of regexes, which clearly needs (senior :D) programmers, so should I care about this at all? Do you think it matters to choose one over another or doesn't matter at all? Maybe if somebody wants to work on this project, they will, if not, no matter which idiom I choose. Can I "convert" non-programmers to help developing these? What are your observations? Other considerations: Imperative will allow any kind of work; there is a simple flow, which they can follow but inside that, they can do whatever they want. It would be harder to force a common interface with imperative because of this arbitrary implementations. Declarative will be much more rigid, which is a bad thing, because formats might change over time without any notice. Declarative will be harder for me to develop and takes longer time. Imperative is already ready to release. I hope a nice discussion will happen in this thread about programming idioms regarding which to use when, which is better for open source projects with different scenarios, which is better for wide range of developer skills. TL; DR: Parsing different file formats (plain text, XML) They contains same kind of information Target audience: non-developers, beginners Regex probably cannot be avoided 30-40 concrete parser classes needed Facilitate coding these concrete classes Which idiom is better?

    Read the article

  • How to Create Views for All Tables with Oracle SQL Developer

    - by thatjeffsmith
    Got this question over the weekend via a friend and Oracle ACE Director, so I thought I would share the answer here. If you want to quickly generate DDL to create VIEWs for all the tables in your system, the easiest way to do that with SQL Developer is to create a data model. Wait, why would I want to do this? StackOverflow has a few things to say on this subject… So, start with importing a data dictionary. Step One: Open of Create a Model In SQL Developer, go to View – Data Modeler – Browser. Then in the browser panel, expand your design and create a new Relational Model. Step Two: Import your Data Dictionary This is a fancy way of saying, ‘suck objects out of the database into my model’ This will open a wizard to connect, select your schema(s), objects, etc. Once they’re in your model, you’re ready to cook with gas I’m using HR (Human Resources) for this example. You should end up with something that looks like this. Our favorite HR model Now we’re ready to generate the views! Step Three: Auto-generate the Views Go to Tools – Data Modeler – Table to View Wizard. I don’t want all my tables included, and I want to change the naming standard Decide if you want to change the default generated view names By default the views will be created as ‘V_TABLE_NAME.’ If you don’t like the ‘V_’ you can enter your own. You also can reference the object and model name with variables as shown in the screenshot above. I’m going to go with something a little more personal. The views are the little green boxes in the diagram Can’t find your views? They should be grouped together in your diagram. Don’t forget to use the Navigator to easily find and navigate to those model diagram objects! Step Four: Generate the DDL Ok, let’s use the Generate DDL button on the toolbar. Un-check everything but your views If you used a prefix, take advantage of that to create a filter. You might have existing views in your model that you don’t want to include, right? Once you click ‘OK’ the DDL will be generated. -- Generated by Oracle SQL Developer Data Modeler 4.0.0.825 -- at: 2013-11-04 10:26:39 EST -- site: Oracle Database 11g -- type: Oracle Database 11g CREATE OR REPLACE VIEW HR.TJS_BLOG_COUNTRIES ( COUNTRY_ID , COUNTRY_NAME , REGION_ID ) AS SELECT COUNTRY_ID , COUNTRY_NAME , REGION_ID FROM HR.COUNTRIES ; CREATE OR REPLACE VIEW HR.TJS_BLOG_EMPLOYEES ( EMPLOYEE_ID , FIRST_NAME , LAST_NAME , EMAIL , PHONE_NUMBER , HIRE_DATE , JOB_ID , SALARY , COMMISSION_PCT , MANAGER_ID , DEPARTMENT_ID ) AS SELECT EMPLOYEE_ID , FIRST_NAME , LAST_NAME , EMAIL , PHONE_NUMBER , HIRE_DATE , JOB_ID , SALARY , COMMISSION_PCT , MANAGER_ID , DEPARTMENT_ID FROM HR.EMPLOYEES ; CREATE OR REPLACE VIEW HR.TJS_BLOG_JOBS ( JOB_ID , JOB_TITLE , MIN_SALARY , MAX_SALARY ) AS SELECT JOB_ID , JOB_TITLE , MIN_SALARY , MAX_SALARY FROM HR.JOBS ; CREATE OR REPLACE VIEW HR.TJS_BLOG_JOB_HISTORY ( EMPLOYEE_ID , START_DATE , END_DATE , JOB_ID , DEPARTMENT_ID ) AS SELECT EMPLOYEE_ID , START_DATE , END_DATE , JOB_ID , DEPARTMENT_ID FROM HR.JOB_HISTORY ; CREATE OR REPLACE VIEW HR.TJS_BLOG_LOCATIONS ( LOCATION_ID , STREET_ADDRESS , POSTAL_CODE , CITY , STATE_PROVINCE , COUNTRY_ID ) AS SELECT LOCATION_ID , STREET_ADDRESS , POSTAL_CODE , CITY , STATE_PROVINCE , COUNTRY_ID FROM HR.LOCATIONS ; CREATE OR REPLACE VIEW HR.TJS_BLOG_REGIONS ( REGION_ID , REGION_NAME ) AS SELECT REGION_ID , REGION_NAME FROM HR.REGIONS ; -- Oracle SQL Developer Data Modeler Summary Report: -- -- CREATE TABLE 0 -- CREATE INDEX 0 -- ALTER TABLE 0 -- CREATE VIEW 6 -- CREATE PACKAGE 0 -- CREATE PACKAGE BODY 0 -- CREATE PROCEDURE 0 -- CREATE FUNCTION 0 -- CREATE TRIGGER 0 -- ALTER TRIGGER 0 -- CREATE COLLECTION TYPE 0 -- CREATE STRUCTURED TYPE 0 -- CREATE STRUCTURED TYPE BODY 0 -- CREATE CLUSTER 0 -- CREATE CONTEXT 0 -- CREATE DATABASE 0 -- CREATE DIMENSION 0 -- CREATE DIRECTORY 0 -- CREATE DISK GROUP 0 -- CREATE ROLE 0 -- CREATE ROLLBACK SEGMENT 0 -- CREATE SEQUENCE 0 -- CREATE MATERIALIZED VIEW 0 -- CREATE SYNONYM 0 -- CREATE TABLESPACE 0 -- CREATE USER 0 -- -- DROP TABLESPACE 0 -- DROP DATABASE 0 -- -- REDACTION POLICY 0 -- -- ERRORS 0 -- WARNINGS 0 You can then choose to save this to a file or not. This has a few steps, but as the number of tables in your system increases, so does the amount of time this feature can save you!

    Read the article

  • Big Data: Size isn’t everything

    - by Simon Elliston Ball
    Big Data has a big problem; it’s the word “Big”. These days, a quick Google search will uncover terabytes of negative opinion about the futility of relying on huge volumes of data to produce magical, meaningful insight. There are also many clichéd but correct assertions about the difficulties of correlation versus causation, in massive data sets. In reading some of these pieces, I begin to understand how climatologists must feel when people complain ironically about “global warming” during snowfall. Big Data has a name problem. There is a lot more to it than size. Shape, Speed, and…err…Veracity are also key elements (now I understand why Gartner and the gang went with V’s instead of S’s). The need to handle data of different shapes (Variety) is not new. Data developers have always had to mold strange-shaped data into our reporting systems, integrating with semi-structured sources, and even straying into full-text searching. However, what we lacked was an easy way to add semi-structured and unstructured data to our arsenal. New “Big Data” tools such as MongoDB, and other NoSQL (Not Only SQL) databases, or a graph database like Neo4J, fill this gap. Still, to many, they simply introduce noise to the clean signal that is their sensibly normalized data structures. What about speed (Velocity)? It’s not just high frequency trading that generates data faster than a single system can handle. Many other applications need to make trade-offs that traditional databases won’t, in order to cope with high data insert speeds, or to extract quickly the required information from data streams. Unfortunately, many people equate Big Data with the Hadoop platform, whose batch driven queries and job processing queues have little to do with “velocity”. StreamInsight, Esper and Tibco BusinessEvents are examples of Big Data tools designed to handle high-velocity data streams. Again, the name doesn’t do the discipline of Big Data any favors. Ultimately, though, does analyzing fast moving data produce insights as useful as the ones we get through a more considered approach, enabled by traditional BI? Finally, we have Veracity and Value. In many ways, these additions to the classic Volume, Velocity and Variety trio acknowledge the criticism that without high-quality data and genuinely valuable outputs then data, big or otherwise, is worthless. As a discipline, Big Data has recognized this, and data quality and cleaning tools are starting to appear to support it. Rather than simply decrying the irrelevance of Volume, we need as a profession to focus how to improve Veracity and Value. Perhaps we should just declare the ‘Big’ silent, embrace these new data tools and help develop better practices for their use, just as we did the good old RDBMS? What does Big Data mean to you? Which V gives your business the most pain, or the most value? Do you see these new tools as a useful addition to the BI toolbox, or are they just enabling a dangerous trend to find ghosts in the noise?

    Read the article

  • Fast Data: Go Big. Go Fast.

    - by Dain C. Hansen
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 For those of you who may have missed it, today’s second full day of Oracle OpenWorld 2012 started with a rumpus. Joe Tucci, from EMC outlined the human face of big data with real examples of how big data is transforming our world. And no not the usual tried-and-true weblog examples, but real stories about taxi cab drivers in Singapore using big data to better optimize their routes as well as folks just trying to get a better hair cut. Next we heard from Thomas Kurian who talked at length about the important platform characteristics of Oracle’s Cloud and more specifically Oracle’s expanded Cloud Services portfolio. Especially interesting to our integration customers are the messaging support for Oracle’s Cloud applications. What this means is that now Oracle’s Cloud applications have a lightweight integration fabric that on-premise applications can communicate to it via REST-APIs using Oracle SOA Suite. It’s an important element to our strategy at Oracle that supports this idea that whether your requirements are for private or public, Oracle has a solution in the Cloud for all of your applications and we give you more deployment choice than any vendor. If this wasn’t enough to get the juices flowing, later that morning we heard from Hasan Rizvi who outlined in his Fusion Middleware session the four most important enterprise imperatives: Social, Mobile, Cloud, and a brand new one: Fast Data. Today, Rizvi made an important step in the definition of this term to explain that he believes it’s a convergence of four essential technology elements: Event Processing for event filtering, business rules – with Oracle Event Processing Data Transformation and Loading - with Oracle Data Integrator Real-time replication and integration – with Oracle GoldenGate Analytics and data discovery – with Oracle Business Intelligence Each of these four elements can be considered (and architect-ed) together on a single integrated platform that can help customers integrate any type of data (structured, semi-structured) leveraging new styles of big data technologies (MapReduce, HDFS, Hive, NoSQL) to process more volume and variety of data at a faster velocity with greater results.  Fast data processing (and especially real-time) has always been our credo at Oracle with each one of these products in Fusion Middleware. For example, Oracle GoldenGate continues to be made even faster with the recent 11g R2 Release of Oracle GoldenGate which gives us some even greater optimization to Oracle Database with Integrated Capture, as well as some new heterogeneity capabilities. With Oracle Data Integrator with Big Data Connectors, we’re seeing much improved performance by running MapReduce transformations natively on Hadoop systems. And with Oracle Event Processing we’re seeing some remarkable performance with customers like NTT Docomo. Check out their upcoming session at Oracle OpenWorld on Wednesday to hear more how this customer is using Event processing and Big Data together. If you missed any of these sessions and keynotes, not to worry. There's on-demand versions available on the Oracle OpenWorld website. You can also checkout our upcoming webcast where we will outline some of these new breakthroughs in Data Integration technologies for Big Data, Cloud, and Real-time in more details. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";}

    Read the article

  • Windows Azure Recipe: Mobile Computing

    - by Clint Edmonson
    A while back, mashups were all the rage. The idea was to compose solutions that provided aggregation and integration across applications and services to make information more available, useful, and personal. Mashups ushered in the era of Web 2.0 in all it’s socially connected goodness. They taught us that to be successful, we needed to add web service APIs to our web applications. Web and client based mashups met with great success and have evolved even further with the introduction of the internet connected smartphone. Nothing is more available, useful, or personal than our smartphones. The current generation of cloud connected mobile computing mashups allow our mobilized workforces to receive, process, and react to information from disparate sources faster than ever before. Drivers Integration Reach Time to market Solution Here’s a sketch of a prototypical mobile computing solution using Windows Azure: Ingredients Web Role – with the phone running a dedicated client application, the web role is responsible for serving up backend web services that implement the solution’s core connected functionality. Database – used to store core operational and workflow data for the solution’s web services. Access Control – this service is used to authenticate and manage users identity, roles, and groups, possibly in conjunction with 3rd identity providers such as Windows LiveID, Google, Yahoo!, and Facebook. Worker Role – this role is used to handle the orchestration of long-running, complex, asynchronous operations. While much of the integration and interaction with other services can be handled directly by the mobile client application, it’s possible that the backend may need to integrate with 3rd party services as well. Offloading this work to a worker role better distributes computing resources and keeps the web roles focused on direct client interaction. Queues – these provide reliable, persistent messaging between applications and processes. They are an absolute necessity once asynchronous processing is involved. Queues facilitate the flow of distributed events and allow a solution to send push notifications back to mobile devices at appropriate times. Training & Resources These links point to online Windows Azure training labs and resources where you can learn more about the individual ingredients described above. (Note: The entire Windows Azure Training Kit can also be downloaded for offline use.) Windows Azure (16 labs) Windows Azure is an internet-scale cloud computing and services platform hosted in Microsoft data centers, which provides an operating system and a set of developer services which can be used individually or together. It gives developers the choice to build web applications; applications running on connected devices, PCs, or servers; or hybrid solutions offering the best of both worlds. New or enhanced applications can be built using existing skills with the Visual Studio development environment and the .NET Framework. With its standards-based and interoperable approach, the services platform supports multiple internet protocols, including HTTP, REST, SOAP, and plain XML SQL Azure (7 labs) Microsoft SQL Azure delivers on the Microsoft Data Platform vision of extending the SQL Server capabilities to the cloud as web-based services, enabling you to store structured, semi-structured, and unstructured data. Windows Azure Services (9 labs) As applications collaborate across organizational boundaries, ensuring secure transactions across disparate security domains is crucial but difficult to implement. Windows Azure Services provides hosted authentication and access control using powerful, secure, standards-based infrastructure. Windows Azure Toolkit for Windows Phone The Windows Azure Toolkit for Windows Phone is designed to make it easier for you to build mobile applications that leverage cloud services running in Windows Azure. The toolkit includes Visual Studio project templates for Windows Phone and Windows Azure, class libraries optimized for use on the phone, sample applications, and documentation Windows Azure Toolkit for iOS The Windows Azure Toolkit for iOS is a toolkit for developers to make it easy to access Windows Azure storage services from native iOS applications. The toolkit can be used for both iPhone and iPad applications, developed using Objective-C and XCode. Windows Azure Toolkit for Android The Windows Azure Toolkit for Android is a toolkit for developers to make it easy to work with Windows Azure from native Android applications. The toolkit can be used for native Android applications developed using Eclipse and the Android SDK. See my Windows Azure Resource Guide for more guidance on how to get started, including links web portals, training kits, samples, and blogs related to Windows Azure.

    Read the article

  • Master Data Management for Location Data - Oracle Site Hub

    - by david.butler(at)oracle.com
    Most MDM discussions cover key domains such as customer, supplier, product, service, and reference data. It is usually understood that these domains have complex structures and hundreds if not thousands of attributes that need governing. Location, on the other hand, strikes most people as address data. How hard can that be? But for many industries, locations are complex, and site information is critical to efficient operations and relevant analytics. Retail stores and malls, bank branches, construction sites come to mind. But one of the best industries for illustrating the power of a site mastering application is Oil & Gas.   Oracle's Master Data Management solution for location data is the Oracle Site Hub. It is a location mastering solution that enables organizations to centralize site and location specific information from heterogeneous systems, creating a single view of site information that can be leveraged across all functional departments and analytical systems.   Let's take a look at the location entities the Oracle Site Hub can manage for the Oil & Gas industry: organizations, property, land, buildings, roads, oilfield, service center, inventory site, real estate, facilities, refineries, storage tanks, vendor locations, businesses, assets; project site, area, well, basin, pipelines, critical infrastructure, offshore platform, compressor station, gas station, etc. Any site can be classified into multiple hierarchies, like organizational hierarchy, operational hierarchy, geographic hierarchy, divisional hierarchies and so on. Any site can also be associated to multiple clusters, i.e. collections of sites, and these can be used as a foundation for driving reporting, analysis, organize daily work, etc. Hierarchies can also be used to model entities which are structured or non-structured collections of nodes, like for example routes, pipelines and more. The User Defined Attribute Framework provides the needed infrastructure to add single row attributes groups like well base attributes (well IDs, well type, well structure and key characterizing measures, and more) and well geometry, and multi row attribute groups like well applications, permits, production data, activities, operations, logs, treatments, tests, drills, treatments, and KPIs. Site Hub can also model areas, lands, fields, basins, pools, platforms, eco-zones, and stratigraphic layers as specific sites, tracking their base attributes, aliases, descriptions, subcomponents and more. Midstream entities (pipelines, logistic sites, pump stations) and downstream entities (cylinders, tanks, inventories, meters, partner's sites, routes, facilities, gas stations, and competitor sites) can also be easily modeled, together with their specific attributes and relationships. Site Hub can store any type of unstructured data associated to a site. This could be stored directly or on an external content management solution, like Oracle Universal Content Management. Considering a well, for example, Site Hub can store any relevant associated multimedia file such as: CAD drawings of the well profile, structure and/or parts, engineering documents, contracts, applications, permits, logs, pictures, photos, videos and more. For any site entity, Site Hub can associate all the related assets and equipments at the site, as well as all relationships between sites, between a site and multiple parties, and between a site and any purchasable or sellable item, over time. Items can be equipment, instruments, facilities, services, products, production entities, production facilities (pipelines, batteries, compressor stations, gas plants, meters, separators, etc.), support facilities (rigs, roads, transmission or radio towers, airstrips, etc.), supplier products and services, catalogs, and more. Items can just be associated to sites using standard Site Hub features, or they can be fully mastered by implementing Oracle Product Hub. Site locations (addresses or geographical coordinates) are also managed with out-of-the-box address geo-coding capabilities coupled with Google Maps integration to deliver powerful mapping capabilities and spatial data analysis. Locations can be shared between different sites. Centered on the site location, any site can also have associated areas. Site Hub can master any site location specific information, like for example cadastral, ownership, jurisdictional, geological, seismic and more, and any site-centric area specific information, like for example economical, political, risk, weather, logistic, traffic information and more. Now if anyone ever asks you why locations need MDM, think about how all these Oil & Gas entities and attributes would translate into your business locations. To learn more about Oracle's full MDM solution for the digital oil field, here is a link to Roberto Negro's outstanding whitepaper: Oracle Site Master Data Management for mastering wells and other PPDM entities in a digital oilfield context  

    Read the article

  • Windows Azure Recipe: Consumer Portal

    - by Clint Edmonson
    Nearly every company on the internet has a web presence. Many are merely using theirs for informational purposes. More sophisticated portals allow customers to register their contact information and provide some level of interaction or customer support. But as our understanding of how consumers use the web increases, the more progressive companies are taking advantage of social web and rich media delivery to connect at a deeper level with the consumers of their goods and services. Drivers Cost reduction Scalability Global distribution Time to market Solution Here’s a sketch of how a Windows Azure Consumer Portal might be built out: Ingredients Web Role – this will host the core of the solution. Each web role is a virtual machine hosting an application written in ASP.NET (or optionally php, or node.js). The number of web roles can be scaled up or down as needed to handle peak and non-peak traffic loads. Database – every modern web application needs to store data. SQL Azure databases look and act exactly like their on-premise siblings but are fault tolerant and have data redundancy built in. Access Control (optional) – if identity needs to be tracked within the solution, the access control service combined with the Windows Identity Foundation framework provides out-of-the-box support for several social media platforms including Windows LiveID, Google, Yahoo!, Facebook. It also has a provider model to allow integration with other platforms as well. Caching (optional) – for sites with high traffic with lots of read-only data and lists, the distributed in-memory caching service can be used to cache and serve up static data at higher scale and speed than direct database requests. It can also be used to manage user session state. Blob Storage (optional) – for sites that serve up unstructured data such as documents, video, audio, device drivers, and more. The data is highly available and stored redundantly across data centers. Each entry in blob storage is provided with it’s own unique URL for direct access by the browser. Content Delivery Network (CDN) (optional) – for sites that service users around the globe, the CDN is an extension to blob storage that, when enabled, will automatically cache frequently accessed blobs and static site content at edge data centers around the world. The data can be delivered statically or streamed in the case of rich media content. Training Labs These links point to online Windows Azure training labs where you can learn more about the individual ingredients described above. (Note: The entire Windows Azure Training Kit can also be downloaded for offline use.) Windows Azure (16 labs) Windows Azure is an internet-scale cloud computing and services platform hosted in Microsoft data centers, which provides an operating system and a set of developer services which can be used individually or together. It gives developers the choice to build web applications; applications running on connected devices, PCs, or servers; or hybrid solutions offering the best of both worlds. New or enhanced applications can be built using existing skills with the Visual Studio development environment and the .NET Framework. With its standards-based and interoperable approach, the services platform supports multiple internet protocols, including HTTP, REST, SOAP, and plain XML SQL Azure (7 labs) Microsoft SQL Azure delivers on the Microsoft Data Platform vision of extending the SQL Server capabilities to the cloud as web-based services, enabling you to store structured, semi-structured, and unstructured data. Windows Azure Services (9 labs) As applications collaborate across organizational boundaries, ensuring secure transactions across disparate security domains is crucial but difficult to implement. Windows Azure Services provides hosted authentication and access control using powerful, secure, standards-based infrastructure. See my Windows Azure Resource Guide for more guidance on how to get started, including links web portals, training kits, samples, and blogs related to Windows Azure.

    Read the article

  • LUKOIL Overseas Holding Optimizes Oil Field Development Projects with Integrated Project Management

    - by Melissa Centurio Lopes
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} LUKOIL Overseas Group is a growing oil and gas company that is an integral part of the vertically integrated oil company OAO LUKOIL. It is engaged in the exploration, acquisition, integration, and efficient development of oil and gas fields outside the Russian Federation to promote transforming LUKOIL into a transnational energy company. In 2010, the company signed a 20-year development project for the giant, West Qurna 2 oil field in Iraq. Executing 10,000 to 15,000 project activities simultaneously on 14 major construction and drilling projects in Iraq for the West Qurna-2 project meant the company needed a clear picture, in real time, of dependencies between its capital construction, geologic exploration and sinking projects—required for its building infrastructure oil field development projects in Iraq. LUKOIL Overseas Holding deployed Oracle’s Primavera P6 Enterprise Project Portfolio Management to generate structured project management information and optimize planning, monitoring, and analysis of all engineering and commercial activities—such as tenders, and bulk procurement of materials and equipment—related to oil field development projects. A word from LUKOIL Overseas Holding Ltd. “Previously, we created project schedules on desktop computers and uploaded them to the project server to be merged into one big file for each project participant to access. This was not scalable, as we’ve grown and now run up to 15,000 activities in numerous projects and subprojects at any time. With Oracle’s Primavera P6 Enterprise Project Portfolio Management, we can now work concurrently on projects with many team members, enjoy absolute security, and issue new baselines for all projects and project participants once a week, with ease.” – Sergey Kotov, Head of IT and the Communication Office, LUKOIL Mid-East Ltd. Oracle Primavera Solutions: · Facilitated managing dependencies between projects by enabling the general scheduler to reschedule all projects and subprojects once a week, realigning 10,000 to 15,000 project activities that the company runs at any time · Replaced Microsoft Project and a paper-based system with a complete solution that provides structured project data · Enhanced data security by establishing project management security policies that enable only authorized project members to edit their project tasks, while enabling each project participant to view all project data that are relevant to that individual’s task · Enabled the company to monitor project progress in comparison to the projected plan, based on physical project assets to determine if each project is on track to conclude within its time and budget limitations To view the full list of solutions view here. “Oracle Gold Partner Parma Telecom was key to our successful Primavera deployment, implementing the software’s basic functionalities, such as project content, timeframes management, and cost management, in addition to performing its integration with our enterprise resource planning system and intranet portal within ten months and in accordance with budgets,” said Rafik Baynazarov, head of the master planning and control office, LUKOIL Mid-East Ltd. “ To read the full version of the customer success story, please view here.

    Read the article

  • Test Drive for Partners on Oracle Endeca Information Discovery

    - by Mike.Hallett(at)Oracle-BI&EPM
    Normal 0 false false false EN-GB X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-fareast-language:EN-US;} Specifically for Oracle Partners, this half-day hands-on workshop allows you to experience Information Discovery from Oracle in order to: Understand Information Discovery and how it compliments classic BI solutions Use Search and Guided Navigation to see how structured and unstructured information can be rapidly brought together to unlock hidden value Explore all of your data in any format and from any source including social media, market surveys and reports Lay the foundation for helping business users who need fast answers to new questions Experience the amazing performance of Endeca on Oracle's in memory Exalytics machine Agenda After an introduction to Oracle Endeca Information Discovery, follow a self-paced, supervised, hands-on tutorial where you will see how easy it is to: Use Guided Navigation and Search to explore structured and unstructured data Rapidly integrate new and changing data sources such as Social Media Build new Discovery user interfaces Rapidly respond to changing business needs and data environments And ask questions of Oracle's Business Analytics experts throughout When 14th March 2013, Registration 9:00 a.m. - finish by 1:00 p.m.      Normal 0 false false false EN-GB X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-fareast-language:EN-US;} Register Now What: Oracle Endeca Information Discovery Test Drive Where: Oracle City Office, 1 South Place, London, EC2M 2RB

    Read the article

  • Create a unique ID by fuzzy matching of names (via agrep using R)

    - by tbrambor
    Using R, I am trying match on people's names in a dataset structured by year and city. Due to some spelling mistakes, exact matching is not possible, so I am trying to use agrep() to fuzzy match names. A sample chunk of the dataset is structured as follows: df <- data.frame(matrix( c("1200013","1200013","1200013","1200013","1200013","1200013","1200013","1200013", "1996","1996","1996","1996","2000","2000","2004","2004","AGUSTINHO FORTUNATO FILHO","ANTONIO PEREIRA NETO","FERNANDO JOSE DA COSTA","PAULO CEZAR FERREIRA DE ARAUJO","PAULO CESAR FERREIRA DE ARAUJO","SEBASTIAO BOCALOM RODRIGUES","JOAO DE ALMEIDA","PAULO CESAR FERREIRA DE ARAUJO"), ncol=3,dimnames=list(seq(1:8),c("citycode","year","candidate")) )) The neat version: citycode year candidate 1 1200013 1996 AGUSTINHO FORTUNATO FILHO 2 1200013 1996 ANTONIO PEREIRA NETO 3 1200013 1996 FERNANDO JOSE DA COSTA 4 1200013 1996 PAULO CEZAR FERREIRA DE ARAUJO 5 1200013 2000 PAULO CESAR FERREIRA DE ARAUJO 6 1200013 2000 SEBASTIAO BOCALOM RODRIGUES 7 1200013 2004 JOAO DE ALMEIDA 8 1200013 2004 PAULO CESAR FERREIRA DE ARAUJO I'd like to check in each city separately, whether there are candidates appearing in several years. E.g. in the example, PAULO CEZAR FERREIRA DE ARAUJO PAULO CESAR FERREIRA DE ARAUJO appears twice (with a spelling mistake). Each candidate across the entire data set should be assigned a unique numeric candidate ID. The dataset is fairly large (5500 cities, approx. 100K entries) so a somewhat efficient coding would be helpful. Any suggestions as to how to implement this?

    Read the article

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >