Search Results

Search found 65999 results on 2640 pages for 'large data volumes'.

Page 24/2640 | < Previous Page | 20 21 22 23 24 25 26 27 28 29 30 31  | Next Page >

  • Text mining on large database (data mining)

    - by yox
    Hello, I have a large database of resumes (CV), and a certain table skills grouping all users skills. inside that table there's a field skill_text that describes the skill in full text. I'm looking for an algorithm/software/method to extract significant terms/phrases from that table in order to build a new table with standarized skills.. Here are some examples skills extracted from the DB : Sectoral and competitive analysis Business Development (incl. in international settings) Specific structure and road design software - Microstation, Macao, AutoCAD (basic knowledge) Creative work (Photoshop, In-Design, Illustrator) checking and reporting back on campaign progress organising and attending events and exhibitions Development : Aptana Studio, PHP, HTML, CSS, JavaScript, SQL, AJAX Discipline: One to one marketing, E-marketing (SEO & SEA, display, emailing, affiliate program) Mix marketing, Viral Marketing, Social network marketing. The output shoud be something like : Sectoral and competitive analysis Business Development Specific structure and road design software - Macao AutoCAD Photoshop In-Design Illustrator organising events Development Aptana Studio PHP HTML CSS JavaScript SQL AJAX Mix marketing Viral Marketing Social network marketing emailing SEO One to one marketing As you see only skills remains no other representation text. I know this is possible using text mining technics but how to do it ? the database is realy large.. it's a good thing because we can calculate text frequency and decide if it's a real skill or just meaningless text... The big problem is .. how to determin that "blablabla" is a skill ? thanks

    Read the article

  • Making Spring Data JPA work with DataNucleus (GAE) (Spring Boot)

    - by xybrek
    There are several hints that Spring Data works with Google App Engine like: http://tommysiu.blogspot.com/2014/01/spring-data-on-gae-part-1.html http://blog.eisele.net/2009/07/spring-300m3-on-google-appengine-with.html Much of the examples are not "Spring Boot" so I've been trying to retrofit things with it. However, I've been stuck with this error for days and days: [INFO] Caused by: java.lang.NullPointerException [INFO] at org.datanucleus.api.jpa.metamodel.SingularAttributeImpl.isVersion(SingularAttributeImpl.java:79) [INFO] at org.springframework.data.jpa.repository.support.JpaMetamodelEntityInformation.findVersionAttribute(JpaMetamodelEntityInformation.java:102) [INFO] at org.springframework.data.jpa.repository.support.JpaMetamodelEntityInformation.<init>(JpaMetamodelEntityInformation.java:79) [INFO] at org.springframework.data.jpa.repository.support.JpaEntityInformationSupport.getMetadata(JpaEntityInformationSupport.java:65) [INFO] at org.springframework.data.jpa.repository.support.JpaRepositoryFactory.getEntityInformation(JpaRepositoryFactory.java:149) [INFO] at org.springframework.data.jpa.repository.support.JpaRepositoryFactory.getTargetRepository(JpaRepositoryFactory.java:88) [INFO] at org.springframework.data.jpa.repository.support.JpaRepositoryFactory.getTargetRepository(JpaRepositoryFactory.java:68) [INFO] at org.springframework.data.repository.core.support.RepositoryFactorySupport.getRepository(RepositoryFactorySupport.java:158) [INFO] at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.initAndReturn(RepositoryFactoryBeanSupport.java:224) [INFO] at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.afterPropertiesSet(RepositoryFactoryBeanSupport.java:210) [INFO] at org.springframework.data.jpa.repository.support.JpaRepositoryFactoryBean.afterPropertiesSet(JpaRepositoryFactoryBean.java:92) [INFO] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory$6.run(AbstractAutowireCapableBeanFactory.java:1602) [INFO] at java.security.AccessController.doPrivileged(Native Method) [INFO] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1599) [INFO] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1549) [INFO] ... 40 more Where, I'm trying to use Spring Data JPA with DataNucleus/AppEngine: @Configuration @ComponentScan @EnableJpaRepositories @EnableTransactionManagement class JpaApplicationConfig { private static final Logger logger = Logger .getLogger(JpaApplicationConfig.class.getName()); @Bean public EntityManagerFactory entityManagerFactory() { logger.info("Loading Entity Manager..."); return Persistence .createEntityManagerFactory("transactions-optional"); } @Bean public PlatformTransactionManager transactionManager() { logger.info("Loading Transaction Manager..."); final JpaTransactionManager txManager = new JpaTransactionManager(); txManager.setEntityManagerFactory(entityManagerFactory()); return txManager; } } I've tested Persistence.createEntityManagerFactory("transactions-optional"); to see if the app can persist using this EMF, well, it does, so I am sure that this EMF works fine. The problem is the "wiring" up with the Spring Data JPA, can anybody help?

    Read the article

  • Parse and read data frame in C?

    - by user253656
    I am writing a program that reads the data from the serial port on Linux. The data are sent by another device with the following frame format: |start | Command | Data | CRC | End | |0x02 | 0x41 | (0-127 octets) | | 0x03| ---------------------------------------------------- The Data field contains 127 octets as shown and octet 1,2 contains one type of data; octet 3,4 contains another data. I need to get these data I know how to write and read data to and from a serial port in Linux, but it is just to write and read a simple string (like "ABD") My issue is that I do not know how to parse the data frame formatted as above so that I can: get the data in octet 1,2 in the Data field get the data in octet 3,4 in the Data field get the value in CRC field to check the consistency of the data Here the sample snip code that read and write a simple string from and to a serial port in Linux: int writeport(int fd, char *chars) { int len = strlen(chars); chars[len] = 0x0d; // stick a <CR> after the command chars[len+1] = 0x00; // terminate the string properly int n = write(fd, chars, strlen(chars)); if (n < 0) { fputs("write failed!\n", stderr); return 0; } return 1; } int readport(int fd, char *result) { int iIn = read(fd, result, 254); result[iIn-1] = 0x00; if (iIn < 0) { if (errno == EAGAIN) { printf("SERIAL EAGAIN ERROR\n"); return 0; } else { printf("SERIAL read error %d %s\n", errno, strerror(errno)); return 0; } } return 1; } Does anyone please have some ideas? Thanks all.

    Read the article

  • Storing large numbers of varying size objects on disk

    - by Foredecker
    I need to develop a system for storing large numbers (10's to 100's of thousands) of objects. Each object is email-like - there is a main text body, and several ancillary text fields of limited size. A body will be from a few bytes, to several KB in size. Each item will have a single unique ID (probably a GUID) that identifies it. The store will only be written to when an object is added to it. It will be read often. Deletions will be rare. The data is almost all human readable text so it will be readily compressible. A system that lets me issue the I/Os and mange the memory and caching would be ideal. I'm going to keep the indexes in memory, using it to map indexes to the single (and primary) key for the objects. Once I have the key, then I'll load it from disk, or the cache. The data management system needs to be part of my application - I do not want to depend on OS services. Or separately installed packages. Native (C++) would be best, but a manged (C#) thing would be ok. I believe that a database is an obvious choice, but this needs to be super-fast for look up and loading into memory of an object. I am not experienced with data base tech and I'm concerned that general relational systems will not handle all this variable sized data efficiently. (Note, this has nothing to do with my job - its a personal project.) In your experience, what are the viable alternatives to a traditional relational DB? Or would a DB work well for this?

    Read the article

  • Pre-filtering and shaping OData feeds using WCF Data Services and the Entity Framework - Part 1

    - by rajbk
    The Open Data Protocol, referred to as OData, is a new data-sharing standard that breaks down silos and fosters an interoperative ecosystem for data consumers (clients) and producers (services) that is far more powerful than currently possible. It enables more applications to make sense of a broader set of data, and helps every data service and client add value to the whole ecosystem. WCF Data Services (previously known as ADO.NET Data Services), then, was the first Microsoft technology to support the Open Data Protocol in Visual Studio 2008 SP1. It provides developers with client libraries for .NET, Silverlight, AJAX, PHP and Java. Microsoft now also supports OData in SQL Server 2008 R2, Windows Azure Storage, Excel 2010 (through PowerPivot), and SharePoint 2010. Many other other applications in the works. * This post walks you through how to create an OData feed, define a shape for the data and pre-filter the data using Visual Studio 2010, WCF Data Services and the Entity Framework. A sample project is attached at the bottom of Part 2 of this post. Pre-filtering and shaping OData feeds using WCF Data Services and the Entity Framework - Part 2 Create the Web Application File –› New –› Project, Select “ASP.NET Empty Web Application” Add the Entity Data Model Right click on the Web Application in the Solution Explorer and select “Add New Item..” Select “ADO.NET Entity Data Model” under "Data”. Name the Model “Northwind” and click “Add”.   In the “Choose Model Contents”, select “Generate Model From Database” and click “Next”   Define a connection to your database containing the Northwind database in the next screen. We are going to expose the Products table through our OData feed. Select “Products” in the “Choose your Database Object” screen.   Click “Finish”. We are done creating our Entity Data Model. Save the Northwind.edmx file created. Add the WCF Data Service Right click on the Web Application in the Solution Explorer and select “Add New Item..” Select “WCF Data Service” from the list and call the service “DataService” (creative, huh?). Click “Add”.   Enable Access to the Data Service Open the DataService.svc.cs class. The class is well commented and instructs us on the next steps. public class DataService : DataService< /* TODO: put your data source class name here */ > { // This method is called only once to initialize service-wide policies. public static void InitializeService(DataServiceConfiguration config) { // TODO: set rules to indicate which entity sets and service operations are visible, updatable, etc. // Examples: // config.SetEntitySetAccessRule("MyEntityset", EntitySetRights.AllRead); // config.SetServiceOperationAccessRule("MyServiceOperation", ServiceOperationRights.All); config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2; } } Replace the comment that starts with “/* TODO:” with “NorthwindEntities” (the entity container name of the Model we created earlier).  WCF Data Services is initially locked down by default, FTW! No data is exposed without you explicitly setting it. You have explicitly specify which Entity sets you wish to expose and what rights are allowed by using the SetEntitySetAccessRule. The SetServiceOperationAccessRule on the other hand sets rules for a specified operation. Let us define an access rule to expose the Products Entity we created earlier. We use the EnititySetRights.AllRead since we want to give read only access. Our modified code is shown below. public class DataService : DataService<NorthwindEntities> { public static void InitializeService(DataServiceConfiguration config) { config.SetEntitySetAccessRule("Products", EntitySetRights.AllRead); config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2; } } We are done setting up our ODataFeed! Compile your project. Right click on DataService.svc and select “View in Browser” to see the OData feed. To view the feed in IE, you must make sure that "Feed Reading View" is turned off. You set this under Tools -› Internet Options -› Content tab.   If you navigate to “Products”, you should see the Products feed. Note also that URIs are case sensitive. ie. Products work but products doesn’t.   Filtering our data OData has a set of system query operations you can use to perform common operations against data exposed by the model. For example, to see only Products in CategoryID 2, we can use the following request: /DataService.svc/Products?$filter=CategoryID eq 2 At the time of this writing, supported operations are $orderby, $top, $skip, $filter, $expand, $format†, $select, $inlinecount. Pre-filtering our data using Query Interceptors The Product feed currently returns all Products. We want to change that so that it contains only Products that have not been discontinued. WCF introduces the concept of interceptors which allows us to inject custom validation/policy logic into the request/response pipeline of a WCF data service. We will use a QueryInterceptor to pre-filter the data so that it returns only Products that are not discontinued. To create a QueryInterceptor, write a method that returns an Expression<Func<T, bool>> and mark it with the QueryInterceptor attribute as shown below. [QueryInterceptor("Products")] public Expression<Func<Product, bool>> OnReadProducts() { return o => o.Discontinued == false; } Viewing the feed after compilation will only show products that have not been discontinued. We also confirm this by looking at the WHERE clause in the SQL generated by the entity framework. SELECT [Extent1].[ProductID] AS [ProductID], ... ... [Extent1].[Discontinued] AS [Discontinued] FROM [dbo].[Products] AS [Extent1] WHERE 0 = [Extent1].[Discontinued] Other examples of Query/Change interceptors can be seen here including an example to filter data based on the identity of the authenticated user. We are done pre-filtering our data. In the next part of this post, we will see how to shape our data. Pre-filtering and shaping OData feeds using WCF Data Services and the Entity Framework - Part 2 Foot Notes * http://msdn.microsoft.com/en-us/data/aa937697.aspx † $format did not work for me. The way to get a Json response is to include the following in the  request header “Accept: application/json, text/javascript, */*” when making the request. This is easily done with most JavaScript libraries.

    Read the article

  • Big Data Matters with ODI12c

    - by Madhu Nair
    contributed by Mike Eisterer On October 17th, 2013, Oracle announced the release of Oracle Data Integrator 12c (ODI12c).  This release signifies improvements to Oracle’s Data Integration portfolio of solutions, particularly Big Data integration. Why Big Data = Big Business Organizations are gaining greater insights and actionability through increased storage, processing and analytical benefits offered by Big Data solutions.  New technologies and frameworks like HDFS, NoSQL, Hive and MapReduce support these benefits now. As further data is collected, analytical requirements increase and the complexity of managing transformations and aggregations of data compounds and organizations are in need for scalable Data Integration solutions. ODI12c provides enterprise solutions for the movement, translation and transformation of information and data heterogeneously and in Big Data Environments through: The ability for existing ODI and SQL developers to leverage new Big Data technologies. A metadata focused approach for cataloging, defining and reusing Big Data technologies, mappings and process executions. Integration between many heterogeneous environments and technologies such as HDFS and Hive. Generation of Hive Query Language. Working with Big Data using Knowledge Modules  ODI12c provides developers with the ability to define sources and targets and visually develop mappings to effect the movement and transformation of data.  As the mappings are created, ODI12c leverages a rich library of prebuilt integrations, known as Knowledge Modules (KMs).  These KMs are contextual to the technologies and platforms to be integrated.  Steps and actions needed to manage the data integration are pre-built and configured within the KMs.  The Oracle Data Integrator Application Adapter for Hadoop provides a series of KMs, specifically designed to integrate with Big Data Technologies.  The Big Data KMs include: Check Knowledge Module Reverse Engineer Knowledge Module Hive Transform Knowledge Module Hive Control Append Knowledge Module File to Hive (LOAD DATA) Knowledge Module File-Hive to Oracle (OLH-OSCH) Knowledge Module  Nothing to beat an Example: To demonstrate the use of the KMs which are part of the ODI Application Adapter for Hadoop, a mapping may be defined to move data between files and Hive targets.  The mapping is defined by dragging the source and target into the mapping, performing the attribute (column) mapping (see Figure 1) and then selecting the KM which will govern the process.  In this mapping example, movie data is being moved from an HDFS source into a Hive table.  Some of the attributes, such as “CUSTID to custid”, have been mapped over. Figure 1  Defining the Mapping Before the proper KM can be assigned to define the technology for the mapping, it needs to be added to the ODI project.  The Big Data KMs have been made available to the project through the KM import process.   Generally, this is done prior to defining the mapping. Figure 2  Importing the Big Data Knowledge Modules Following the import, the KMs are available in the Designer Navigator. v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} Normal 0 false false false EN-US ZH-TW X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} Figure 3  The Project View in Designer, Showing Installed IKMs Once the KM is imported, it may be assigned to the mapping target.  This is done by selecting the Physical View of the mapping and examining the Properties of the Target.  In this case MOVIAPP_LOG_STAGE is the target of our mapping. Figure 4  Physical View of the Mapping and Assigning the Big Data Knowledge Module to the Target Alternative KMs may have been selected as well, providing flexibility and abstracting the logical mapping from the physical implementation.  Our mapping may be applied to other technologies as well. The mapping is now complete and is ready to run.  We will see more in a future blog about running a mapping to load Hive. To complete the quick ODI for Big Data Overview, let us take a closer look at what the IKM File to Hive is doing for us.  ODI provides differentiated capabilities by defining the process and steps which normally would have to be manually developed, tested and implemented into the KM.  As shown in figure 5, the KM is preparing the Hive session, managing the Hive tables, performing the initial load from HDFS and then performing the insert into Hive.  HDFS and Hive options are selected graphically, as shown in the properties in Figure 4. Figure 5  Process and Steps Managed by the KM What’s Next Big Data being the shape shifting business challenge it is is fast evolving into the deciding factor between market leaders and others. Now that an introduction to ODI and Big Data has been provided, look for additional blogs coming soon using the Knowledge Modules which make up the Oracle Data Integrator Application Adapter for Hadoop: Importing Big Data Metadata into ODI, Testing Data Stores and Loading Hive Targets Generating Transformations using Hive Query language Loading Oracle from Hadoop Sources For more information now, please visit the Oracle Data Integrator Application Adapter for Hadoop web site, http://www.oracle.com/us/products/middleware/data-integration/hadoop/overview/index.html Do not forget to tune in to the ODI12c Executive Launch webcast on the 12th to hear more about ODI12c and GG12c. Normal 0 false false false EN-US ZH-TW X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";}

    Read the article

  • Importing large datasets on iPhone using CoreData

    - by Matthes
    Hi there, I'm facing very annoying problem. My iPhone app is loading it's data from a network server. Data are sent as plist and when parsed, it neeeds to be stored to SQLite db using CoreData. Issue is that in some cases those datasets are too big (5000+ records) and import takes way too long. More on that, when iPhone tries to suspend the screen, Watchdog kills the app because it's still processing the import and does not respond up to 5 seconds, so import is never finished. I used all recommended techniques according to article "Efficiently Importing Data" http://developer.apple.com/mac/library/DOCUMENTATION/Cocoa/Conceptual/CoreData/Articles/cdImporting.html and other docs concerning this, but it's still awfully slow. Solution I'm looking for is to let app suspend, but let import run in behind (better one) or to prevent attempts to suspend the app at all. Or any better idea is welcomed too. Any tips on how to overcome these issues are highly appreciated! Thanks

    Read the article

  • Storage for large gridded datasets

    - by nullglob
    I am looking for a good storage format for large, gridded datasets. The application is meteorology, and we would prefer a format that is common within this field (to help exchange data with others). I don't need to deal with special data structures, and there should be a Fortran API. I am currently considering HDF5, GRIB2 and NetCDF4. How do these formats compare in terms of data compression? What are their main limitations? How steep is the learning curve? Are there any other storage formats worth investigating? I have not found a great deal of material outlining the differences and pros/cons of these formats (there is one relevant SO thread, and a presentation comparing GRIB and NetCDF).

    Read the article

  • JavaScript Data Binding Frameworks

    - by dwahlin
    Data binding is where it’s at now days when it comes to building client-centric Web applications. Developers experienced with desktop frameworks like WPF or web frameworks like ASP.NET, Silverlight, or others are used to being able to take model objects containing data and bind them to UI controls quickly and easily. When moving to client-side Web development the data binding story hasn’t been great since neither HTML nor JavaScript natively support data binding. This means that you have to write code to place data in a control and write code to extract it. Although it’s certainly feasible to do it from scratch (many of us have done it this way for years), it’s definitely tedious and not exactly the best solution when it comes to maintenance and re-use. Over the last few years several different script libraries have been released to simply the process of binding data to HTML controls. In fact, the subject of data binding is becoming so popular that it seems like a new script library is being released nearly every week. Many of the libraries provide MVC/MVVM pattern support in client-side JavaScript apps and some even integrate directly with server frameworks like Node.js. Here’s a quick list of a few of the available libraries that support data binding (if you like any others please add a comment and I’ll try to keep the list updated): AngularJS MVC framework for data binding (although closely follows the MVVM pattern). Backbone.js MVC framework with support for models, key/value binding, custom events, and more. Derby Provides a real-time environment that runs in the browser an in Node.js. The library supports data binding and templates. Ember Provides support for templates that automatically update as data changes. JsViews Data binding framework that provides “interactive data-driven views built on top of JsRender templates”. jQXB Expression Binder Lightweight jQuery plugin that supports bi-directional data binding support. KnockoutJS MVVM framework with robust support for data binding. For an excellent look at using KnockoutJS check out John Papa’s course on Pluralsight. Meteor End to end framework that uses Node.js on the server and provides support for data binding on  the client. Simpli5 JavaScript framework that provides support for two-way data binding. WinRT with HTML5/JavaScript If you’re building Windows 8 applications using HTML5 and JavaScript there’s built-in support for data binding in the WinJS library.   I won’t have time to write about each of these frameworks, but in the next post I’m going to talk about my (current) favorite when it comes to client-side JavaScript data binding libraries which is AngularJS. AngularJS provides an extremely clean way – in my opinion - to extend HTML syntax to support data binding while keeping model objects (the objects that hold the data) free from custom framework method calls or other weirdness. While I’m writing up the next post, feel free to visit the AngularJS developer guide if you’d like additional details about the API and want to get started using it.

    Read the article

  • Storing large data in HTTP Session (Java Application)

    - by Umesh Awasthi
    I am asking this question in continuation with http-session-or-database-approach. I am planning to follow this approach. When user add product to cart, create a Cart Model, add items to cart and save to DB. Convert Cart model to cart data and save it to HTTP session. Any update/ edit update underlying cart in DB and update data snap shot in Session. When user click on view cart page, just pick cart data from Session and display to customer. I have following queries regarding HTTP Session How good is it to store large data (Shopping Cart) in Session? How scalable this approach can be ? (With respect to Session) Won't my application going to eat and demand a lot of memory? Is my approach is fine or do i need to consider other points while designing this? Though, we can control what all cart data should be stored in the Session, but still we need to have certain information in cart data being stored in session?

    Read the article

  • Protect Data and Save Money? Learn How Best-in-Class Organizations do Both

    - by roxana.bradescu
    Databases contain nearly two-thirds of the sensitive information that must be protected as part of any organization's overall approach to security, risk management, and compliance. Solutions for protecting data housed in databases vary from encrypting data at the application level to defense-in-depth protection of the database itself. So is there a difference? Absolutely! According to new research from the Aberdeen Group, Best-in-Class organizations experience fewer data breaches and audit deficiencies - at lower cost -- by deploying database security solutions. And the results are dramatic: Aberdeen found that organizations encrypting data within their databases achieved 30% fewer data breaches and 15% greater audit efficiency with 34% less total cost when compared to organizations encrypting data within applications. Join us for a live webcast with Derek Brink, Vice President and Research Fellow at the Aberdeen Group, next week to learn how your organization can become Best-in-Class.

    Read the article

  • Bad Data is Really the Monster

    - by Dain C. Hansen
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} Bad Data is really the monster – is an article written by Bikram Sinha who I borrowed the title and the inspiration for this blog. Sinha writes: “Bad or missing data makes application systems fail when they process order-level data. One of the key items in the supply-chain industry is the product (aka SKU). Therefore, it becomes the most important data element to tie up multiple merchandising processes including purchase order allocation, stock movement, shipping notifications, and inventory details… Bad data can cause huge operational failures and cost millions of dollars in terms of time, resources, and money to clean up and validate data across multiple participating systems. Yes bad data really is the monster, so what do we do about it? Close our eyes and hope it stays in the closet? We’ve tacked this problem for some years now at Oracle, and with our latest introduction of Oracle Enterprise Data Quality along with our integrated Oracle Master Data Management products provides a complete, best-in-class answer to the bad data monster. What’s unique about it? Oracle Enterprise Data Quality also combines powerful data profiling, cleansing, matching, and monitoring capabilities while offering unparalleled ease of use. What makes it unique is that it has dedicated capabilities to address the distinct challenges of both customer and product data quality – [different monsters have different needs of course!]. And the ability to profile data is just as important to identify and measure poor quality data and identify new rules and requirements. Included are semantic and pattern-based recognition to accurately parse and standardize data that is poorly structured. Finally all of the data quality components are integrated with Oracle Master Data Management, including Oracle Customer Hub and Oracle Product Hub, as well as Oracle Data Integrator Enterprise Edition and Oracle CRM. Want to learn more? On Tuesday Nov 15th, I invite you to listen to our webcast on Reduce ERP consolidation risks with Oracle Master Data Management I’ll be joined by our partner iGate Patni and be talking about one specific way to deal with the bad data monster specifically around ERP consolidation. Look forward to seeing you there!

    Read the article

  • Multiple volumetric lights

    - by notabene
    I recently read this GPU GEMS 3 article Volumetric Light Scattering as a Post-Process. I like the idea to add volumetric light property to realtime render i'm working on. Question is will it work for multiple lights? Our renderer uses one render pass per light and uses additive blending to sum incoming light. I'm mostly convinced that it have to work nice. Do you agree? Maybe there can be problem where light rays crosses each other.

    Read the article

  • CPU Usage in Very Large Coherence Clusters

    - by jpurdy
    When sizing Coherence installations, one of the complicating factors is that these installations (by their very nature) tend to be application-specific, with some being large, memory-intensive caches, with others acting as I/O-intensive transaction-processing platforms, and still others performing CPU-intensive calculations across the data grid. Regardless of the primary resource requirements, Coherence sizing calculations are inherently empirical, in that there are so many permutations that a simple spreadsheet approach to sizing is rarely optimal (though it can provide a good starting estimate). So we typically recommend measuring actual resource usage (primarily CPU cycles, network bandwidth and memory) at a given load, and then extrapolating from those measurements. Of course there may be multiple types of load, and these may have varying degrees of correlation -- for example, an increased request rate may drive up the number of objects "pinned" in memory at any point, but the increase may be less than linear if those objects are naturally shared by concurrent requests. But for most reasonably-designed applications, a linear resource model will be reasonably accurate for most levels of scale. However, at extreme scale, sizing becomes a bit more complicated as certain cluster management operations -- while very infrequent -- become increasingly critical. This is because certain operations do not naturally tend to scale out. In a small cluster, sizing is primarily driven by the request rate, required cache size, or other application-driven metrics. In larger clusters (e.g. those with hundreds of cluster members), certain infrastructure tasks become intensive, in particular those related to members joining and leaving the cluster, such as introducing new cluster members to the rest of the cluster, or publishing the location of partitions during rebalancing. These tasks have a strong tendency to require all updates to be routed via a single member for the sake of cluster stability and data integrity. Fortunately that member is dynamically assigned in Coherence, so it is not a single point of failure, but it may still become a single point of bottleneck (until the cluster finishes its reconfiguration, at which point this member will have a similar load to the rest of the members). The most common cause of scaling issues in large clusters is disabling multicast (by configuring well-known addresses, aka WKA). This obviously impacts network usage, but it also has a large impact on CPU usage, primarily since the senior member must directly communicate certain messages with every other cluster member, and this communication requires significant CPU time. In particular, the need to notify the rest of the cluster about membership changes and corresponding partition reassignments adds stress to the senior member. Given that portions of the network stack may tend to be single-threaded (both in Coherence and the underlying OS), this may be even more problematic on servers with poor single-threaded performance. As a result of this, some extremely large clusters may be configured with a smaller number of partitions than ideal. This results in the size of each partition being increased. When a cache server fails, the other servers will use their fractional backups to recover the state of that server (and take over responsibility for their backed-up portion of that state). The finest granularity of this recovery is a single partition, and the single service thread can not accept new requests during this recovery. Ordinarily, recovery is practically instantaneous (it is roughly equivalent to the time required to iterate over a set of backup backing map entries and move them to the primary backing map in the same JVM). But certain factors can increase this duration drastically (to several seconds): large partitions, sufficiently slow single-threaded CPU performance, many or expensive indexes to rebuild, etc. The solution of course is to mitigate each of those factors but in many cases this may be challenging. Larger clusters also lead to the temptation to place more load on the available hardware resources, spreading CPU resources thin. As an example, while we've long been aware of how garbage collection can cause significant pauses, it usually isn't viewed as a major consumer of CPU (in terms of overall system throughput). Typically, the use of a concurrent collector allows greater responsiveness by minimizing pause times, at the cost of reducing system throughput. However, at a recent engagement, we were forced to turn off the concurrent collector and use a traditional parallel "stop the world" collector to reduce CPU usage to an acceptable level. In summary, there are some less obvious factors that may result in excessive CPU consumption in a larger cluster, so it is even more critical to test at full scale, even though allocating sufficient hardware may often be much more difficult for these large clusters.

    Read the article

  • large product image structured data and visibility

    - by Mark Resølved
    On an eCommerce site we two images for a product. One medium sized shown on top of the page and one large photo shown on click in an overlay. We use http://schema.org/Product microdata on the page. We'd like the large, initially hidden, photo to be the main image for the product, as it's the better looking one. So it's also referenced in the XML sitemap as <image:image>. So we also put the itemprop"image" attribute on the, hidden large image. But i'm wondering is it a bad idea to use a microdata attribute on a hidden style="display:none;" element? is there a better way to embed the main image in terms of SEO, without showing it initially?

    Read the article

  • Where can I locate business data to use in my application?

    - by Aaron McIver
    This question talks about any and all free public raw data which appeared to have valuable pieces but nothing that really provides what I am looking for. Instead of using a socially defined listing of businesses (foursquare), I would like a business listing data set of registered businesses and associated addresses that could then be searchable based on location (coordinates). The critical need is that the data set should be filterable based on varying criteria (give me all restaurants, coffee shops, etc...). If the data is free that is great but anywhere that sells this type of data would also suffice. Infochimps looked like a possibility but perhaps something a bit more extensive exists. Where can I find a free or for fee data set of registered business that is filterable based on type of business and location?

    Read the article

  • How to create a large level game?

    - by Siddharth
    I want to know how to create a large game which has more than one level in it and those levels are loaded from the xml file. In my game I have many objects for each different level which I have to load when use click on it. At present for example my game contain 20 levels and now I was loading all the graphic object for all 20 levels. But the correct way was that only load graphic of that particular level only. So I don't know how to do that. So please explain this by providing game example. At present I was creating a class for each my game object image by extending sprite to it. I know it was not a suitable way so provide guidance on it. Basically I want to know how to create large games in andengine? Please help me about that because it will provide help to other community member also because andengine did not have proper documentation for learning developer about how to manage large game?

    Read the article

  • Display large amount of data to client through pagination

    - by ebram tharwat
    I have a web application in which i need to show a big number of data or records for clients. Now i 'll use pagination but i was wondering should I: Load all the data once then pagination, sorting and sarching 'll be easy..But it 'll takes big time(using local DB it takes up to 9 sec.) Or each time i show new page(from the pagination) i make a new request to server and then new request to DB to get the next records..But then what if the client click on Prev button, i 'll make a new request to get data that I had previously..Should i cach data that are loaded before and how if that's good technique? So load all data once or make a new request every time i need data that maybe have been loaded before. I'm using ASP.NET MVC SPA with durandaljs and knockoutjs

    Read the article

  • How to gain understanding of large systems? [closed]

    - by vonolsson
    Possible Duplicate: How do you dive into large code bases? I have worked as a developer developing C/C++ applications for mobile platforms (Windows Mobile and Symbian) for about six years. About a year ago, however, I changed job and currently work with large(!) enterprise systems with high security and availability requirements, all developed in Java. My problem is that I am having a hard time getting a grip on the architecture of the systems, even after a year working with them, and understanding systems other people have built has never been my strong side. The fact that I haven't worked with enterprise systems before doesn't exactly help. Does anyone have a good approach on how to learn and understand large systems? Are there any particular techniques and/or patterns I should read up on?

    Read the article

  • Sharing disk volumes across OpenVZ guests to reduce Package Management Overhead

    - by andyortlieb
    Is it feasible to create a single "master" OpenVZ guest who would only be used for package management, and use something like mount --bind on several other OpenVZ guests sort of trick them into using the environment installed by the master guest? The point of this would be so that users can maintain their own containers, and yet stay in sync with the master development environment, so they'll always have the latest & greatest requirements without worrying too much about system administration. If they need to install their own packages, could put them in /opt, or /usr/local (or set a path to their home directory)? To rephrase, I would like several (developer's, for example) OpenVZ guests whose /bin, /usr (and so on...) actually refer to the same disk location as that of a master OpenVZ guest who can be started up to install and update common packages for the environment to be shared by all of this group of OpenVZ guests. For what it's worth, we're running Debian 6. Edit: I have tried mounting (bind, and readonly) /bin, /lib, /sbin, /usr in this fashion and it refuses to start the containers stating that files are already mounted or otherwise in use: Starting container ... vzquota : (error) Quota on syscall for id 1102: Device or resource busy vzquota : (error) Possible reasons: vzquota : (error) - Container's root is already mounted vzquota : (error) - there are opened files inside Container's private area vzquota : (error) - your current working directory is inside Container's vzquota : (error) private area vzquota : (error) Currently used file(s): /var/lib/vz/private/1102/sbin /var/lib/vz/private/1102/usr /var/lib/vz/private/1102/lib /var/lib/vz/private/1102/bin vzquota on failed [3] If I unmount these four volumes, and start the guest, and then mount them after the guest has started, the guest never sees them mounted.

    Read the article

  • Visualizing the SiteMap of a large (page number) website

    - by Michael
    I'm looking for a tool or service that can spider a web domain with a large number of pages, create a sitemap, and then visualize that map in a way that will help me see, understand and group content (I'm new to the site) Something like a tree-view or other standard Site Map visualizations would be great. I am yet unable to find a tool that does this (I've found plenty of things to spider the site and create an xml file, nothing to visualize it) Thanks!

    Read the article

  • How to write a large number of nested records in JSON with Python

    - by jamesmcm
    I want to produce a JSON file, containing some initial parameters and then records of data like this: { "measurement" : 15000, "imi" : 0.5, "times" : 30, "recalibrate" : false, { "colorlist" : [234, 431, 134] "speclist" : [0.34, 0.42, 0.45, 0.34, 0.78] } { "colorlist" : [214, 451, 114] "speclist" : [0.44, 0.32, 0.45, 0.37, 0.53] } ... } How can this be achieved using the Python json module? The data records cannot be added by hand as there are very many.

    Read the article

  • Sybase PowerDesigner Change Many (Find/Replace/Convert) Data Item's Data Types

    - by Andy
    Hello, I have a relatively large Conceptual Data Model in PowerDesigner. After generating a Physical Data Model and seeing the DBMS data types, I need to update all of data types(NUMBER/TEXT) for each data item. I'd like to either do a find/replace within the Conceptual Data Model or somehow map to different data types when creating the Physical Data Model. Ex. Change the auto conversion of Text - Clob, to Text - NVARCHAR(20). Thanks!

    Read the article

  • Isolate large amounts of data w/ jQuery

    - by Bry4n
    I work for a transit agency and I have large amounts of data (mostly times), and I need a way to filter the data using two textboxes (To and From). I found jQuery quick search, but it seems to only work with one textbox. If anyone has any ideas via jQuery or some other client side library, that would be fantastic.

    Read the article

< Previous Page | 20 21 22 23 24 25 26 27 28 29 30 31  | Next Page >