Search Results

Search found 59438 results on 2378 pages for 'data loader'.

Page 22/2378 | < Previous Page | 18 19 20 21 22 23 24 25 26 27 28 29 | Next Page >

Oracle Enterprise Data Quality Adds Global Address Verification Capabilities for Greater Accuracy and Broader Location Coverage

- by Mala Narasimharajan

Data quality – has many flavors to it. Product, Customer – you name the data domain and there’s data quality associated with it. Address verification and data quality are a little different. in that there is a tremendous amount of variation as well as nuance attached to it. Specifically, what makes address verification challenging is that more often than not, addresses are incomplete, riddled with misspellings, incorrect postal codes are assigned to locations or non-address items are present. Almost all data has locations, and accurate locations power a wealth of business processes: Customer Relationship Management, data quality, delivery of materials, goods or services, fraud detection, insurance risk assessment, data analytics, store and territory planning, and much more. Oracle Address Verification Server provides location-based services as well as deeper parsing and analysis capabilities for Oracle Enterprise Data Quality. Specifically, Pre-integrated with the EDQ platform, Oracle Address Verification Server provides robust parsing, validation, as well as specialized location information for over 240 countries – all populated countries on Earth. Oracle Enterprise Data Quality (EDQ) is a data quality platform, dedicated to address the distinct challenges of customer and product data quality, and performs advanced data profiling to identify and measure poor quality data and identify rule requirements, as well as semantic and pattern-based recognition to accurately parse and standardize data that is poorly structured. EDQ is integrated with Oracle Master Data Management, including Oracle Customer Hub and Oracle Product Hub, as well as Oracle Data Integrator Enterprise Edition and Oracle CRM. Address Verification Server provides key address verification services for Oracle CRM and Oracle Customer Hub. In addition, Address Verification Server provides greater accuracy when handling address data due to its expanded sources and extensible knowledge repository, solid parsing across locales and countries as well as adept handling of extraneous data in address fields. For more information on Oracle Address Verification Server visit: http://bit.ly/GMUE4H and http://bit.ly/GWf7U6

Read the article
Making Spring Data JPA work with DataNucleus (GAE) (Spring Boot)

- by xybrek

There are several hints that Spring Data works with Google App Engine like: http://tommysiu.blogspot.com/2014/01/spring-data-on-gae-part-1.html http://blog.eisele.net/2009/07/spring-300m3-on-google-appengine-with.html Much of the examples are not "Spring Boot" so I've been trying to retrofit things with it. However, I've been stuck with this error for days and days: [INFO] Caused by: java.lang.NullPointerException [INFO] at org.datanucleus.api.jpa.metamodel.SingularAttributeImpl.isVersion(SingularAttributeImpl.java:79) [INFO] at org.springframework.data.jpa.repository.support.JpaMetamodelEntityInformation.findVersionAttribute(JpaMetamodelEntityInformation.java:102) [INFO] at org.springframework.data.jpa.repository.support.JpaMetamodelEntityInformation.<init>(JpaMetamodelEntityInformation.java:79) [INFO] at org.springframework.data.jpa.repository.support.JpaEntityInformationSupport.getMetadata(JpaEntityInformationSupport.java:65) [INFO] at org.springframework.data.jpa.repository.support.JpaRepositoryFactory.getEntityInformation(JpaRepositoryFactory.java:149) [INFO] at org.springframework.data.jpa.repository.support.JpaRepositoryFactory.getTargetRepository(JpaRepositoryFactory.java:88) [INFO] at org.springframework.data.jpa.repository.support.JpaRepositoryFactory.getTargetRepository(JpaRepositoryFactory.java:68) [INFO] at org.springframework.data.repository.core.support.RepositoryFactorySupport.getRepository(RepositoryFactorySupport.java:158) [INFO] at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.initAndReturn(RepositoryFactoryBeanSupport.java:224) [INFO] at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.afterPropertiesSet(RepositoryFactoryBeanSupport.java:210) [INFO] at org.springframework.data.jpa.repository.support.JpaRepositoryFactoryBean.afterPropertiesSet(JpaRepositoryFactoryBean.java:92) [INFO] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory$6.run(AbstractAutowireCapableBeanFactory.java:1602) [INFO] at java.security.AccessController.doPrivileged(Native Method) [INFO] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1599) [INFO] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1549) [INFO] ... 40 more Where, I'm trying to use Spring Data JPA with DataNucleus/AppEngine: @Configuration @ComponentScan @EnableJpaRepositories @EnableTransactionManagement class JpaApplicationConfig { private static final Logger logger = Logger .getLogger(JpaApplicationConfig.class.getName()); @Bean public EntityManagerFactory entityManagerFactory() { logger.info("Loading Entity Manager..."); return Persistence .createEntityManagerFactory("transactions-optional"); } @Bean public PlatformTransactionManager transactionManager() { logger.info("Loading Transaction Manager..."); final JpaTransactionManager txManager = new JpaTransactionManager(); txManager.setEntityManagerFactory(entityManagerFactory()); return txManager; } } I've tested Persistence.createEntityManagerFactory("transactions-optional"); to see if the app can persist using this EMF, well, it does, so I am sure that this EMF works fine. The problem is the "wiring" up with the Spring Data JPA, can anybody help?

Read the article
Parse and read data frame in C?

- by user253656

I am writing a program that reads the data from the serial port on Linux. The data are sent by another device with the following frame format: |start | Command | Data | CRC | End | |0x02 | 0x41 | (0-127 octets) | | 0x03| ---------------------------------------------------- The Data field contains 127 octets as shown and octet 1,2 contains one type of data; octet 3,4 contains another data. I need to get these data I know how to write and read data to and from a serial port in Linux, but it is just to write and read a simple string (like "ABD") My issue is that I do not know how to parse the data frame formatted as above so that I can: get the data in octet 1,2 in the Data field get the data in octet 3,4 in the Data field get the value in CRC field to check the consistency of the data Here the sample snip code that read and write a simple string from and to a serial port in Linux: int writeport(int fd, char *chars) { int len = strlen(chars); chars[len] = 0x0d; // stick a <CR> after the command chars[len+1] = 0x00; // terminate the string properly int n = write(fd, chars, strlen(chars)); if (n < 0) { fputs("write failed!\n", stderr); return 0; } return 1; } int readport(int fd, char *result) { int iIn = read(fd, result, 254); result[iIn-1] = 0x00; if (iIn < 0) { if (errno == EAGAIN) { printf("SERIAL EAGAIN ERROR\n"); return 0; } else { printf("SERIAL read error %d %s\n", errno, strerror(errno)); return 0; } } return 1; } Does anyone please have some ideas? Thanks all.

Read the article
Pre-filtering and shaping OData feeds using WCF Data Services and the Entity Framework - Part 1

- by rajbk

The Open Data Protocol, referred to as OData, is a new data-sharing standard that breaks down silos and fosters an interoperative ecosystem for data consumers (clients) and producers (services) that is far more powerful than currently possible. It enables more applications to make sense of a broader set of data, and helps every data service and client add value to the whole ecosystem. WCF Data Services (previously known as ADO.NET Data Services), then, was the first Microsoft technology to support the Open Data Protocol in Visual Studio 2008 SP1. It provides developers with client libraries for .NET, Silverlight, AJAX, PHP and Java. Microsoft now also supports OData in SQL Server 2008 R2, Windows Azure Storage, Excel 2010 (through PowerPivot), and SharePoint 2010. Many other other applications in the works. * This post walks you through how to create an OData feed, define a shape for the data and pre-filter the data using Visual Studio 2010, WCF Data Services and the Entity Framework. A sample project is attached at the bottom of Part 2 of this post. Pre-filtering and shaping OData feeds using WCF Data Services and the Entity Framework - Part 2 Create the Web Application File –› New –› Project, Select “ASP.NET Empty Web Application” Add the Entity Data Model Right click on the Web Application in the Solution Explorer and select “Add New Item..” Select “ADO.NET Entity Data Model” under "Data”. Name the Model “Northwind” and click “Add”. In the “Choose Model Contents”, select “Generate Model From Database” and click “Next” Define a connection to your database containing the Northwind database in the next screen. We are going to expose the Products table through our OData feed. Select “Products” in the “Choose your Database Object” screen. Click “Finish”. We are done creating our Entity Data Model. Save the Northwind.edmx file created. Add the WCF Data Service Right click on the Web Application in the Solution Explorer and select “Add New Item..” Select “WCF Data Service” from the list and call the service “DataService” (creative, huh?). Click “Add”. Enable Access to the Data Service Open the DataService.svc.cs class. The class is well commented and instructs us on the next steps. public class DataService : DataService< /* TODO: put your data source class name here */ > { // This method is called only once to initialize service-wide policies. public static void InitializeService(DataServiceConfiguration config) { // TODO: set rules to indicate which entity sets and service operations are visible, updatable, etc. // Examples: // config.SetEntitySetAccessRule("MyEntityset", EntitySetRights.AllRead); // config.SetServiceOperationAccessRule("MyServiceOperation", ServiceOperationRights.All); config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2; } } Replace the comment that starts with “/* TODO:” with “NorthwindEntities” (the entity container name of the Model we created earlier). WCF Data Services is initially locked down by default, FTW! No data is exposed without you explicitly setting it. You have explicitly specify which Entity sets you wish to expose and what rights are allowed by using the SetEntitySetAccessRule. The SetServiceOperationAccessRule on the other hand sets rules for a specified operation. Let us define an access rule to expose the Products Entity we created earlier. We use the EnititySetRights.AllRead since we want to give read only access. Our modified code is shown below. public class DataService : DataService<NorthwindEntities> { public static void InitializeService(DataServiceConfiguration config) { config.SetEntitySetAccessRule("Products", EntitySetRights.AllRead); config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2; } } We are done setting up our ODataFeed! Compile your project. Right click on DataService.svc and select “View in Browser” to see the OData feed. To view the feed in IE, you must make sure that "Feed Reading View" is turned off. You set this under Tools -› Internet Options -› Content tab. If you navigate to “Products”, you should see the Products feed. Note also that URIs are case sensitive. ie. Products work but products doesn’t. Filtering our data OData has a set of system query operations you can use to perform common operations against data exposed by the model. For example, to see only Products in CategoryID 2, we can use the following request: /DataService.svc/Products?$filter=CategoryID eq 2 At the time of this writing, supported operations are $orderby, $top, $skip, $filter, $expand, $format†, $select, $inlinecount. Pre-filtering our data using Query Interceptors The Product feed currently returns all Products. We want to change that so that it contains only Products that have not been discontinued. WCF introduces the concept of interceptors which allows us to inject custom validation/policy logic into the request/response pipeline of a WCF data service. We will use a QueryInterceptor to pre-filter the data so that it returns only Products that are not discontinued. To create a QueryInterceptor, write a method that returns an Expression<Func<T, bool>> and mark it with the QueryInterceptor attribute as shown below. [QueryInterceptor("Products")] public Expression<Func<Product, bool>> OnReadProducts() { return o => o.Discontinued == false; } Viewing the feed after compilation will only show products that have not been discontinued. We also confirm this by looking at the WHERE clause in the SQL generated by the entity framework. SELECT [Extent1].[ProductID] AS [ProductID], ... ... [Extent1].[Discontinued] AS [Discontinued] FROM [dbo].[Products] AS [Extent1] WHERE 0 = [Extent1].[Discontinued] Other examples of Query/Change interceptors can be seen here including an example to filter data based on the identity of the authenticated user. We are done pre-filtering our data. In the next part of this post, we will see how to shape our data. Pre-filtering and shaping OData feeds using WCF Data Services and the Entity Framework - Part 2 Foot Notes * http://msdn.microsoft.com/en-us/data/aa937697.aspx † $format did not work for me. The way to get a Json response is to include the following in the request header “Accept: application/json, text/javascript, */*” when making the request. This is easily done with most JavaScript libraries.

Read the article
Big Data Matters with ODI12c

- by Madhu Nair

contributed by Mike Eisterer On October 17th, 2013, Oracle announced the release of Oracle Data Integrator 12c (ODI12c). This release signifies improvements to Oracle’s Data Integration portfolio of solutions, particularly Big Data integration. Why Big Data = Big Business Organizations are gaining greater insights and actionability through increased storage, processing and analytical benefits offered by Big Data solutions. New technologies and frameworks like HDFS, NoSQL, Hive and MapReduce support these benefits now. As further data is collected, analytical requirements increase and the complexity of managing transformations and aggregations of data compounds and organizations are in need for scalable Data Integration solutions. ODI12c provides enterprise solutions for the movement, translation and transformation of information and data heterogeneously and in Big Data Environments through: The ability for existing ODI and SQL developers to leverage new Big Data technologies. A metadata focused approach for cataloging, defining and reusing Big Data technologies, mappings and process executions. Integration between many heterogeneous environments and technologies such as HDFS and Hive. Generation of Hive Query Language. Working with Big Data using Knowledge Modules ODI12c provides developers with the ability to define sources and targets and visually develop mappings to effect the movement and transformation of data. As the mappings are created, ODI12c leverages a rich library of prebuilt integrations, known as Knowledge Modules (KMs). These KMs are contextual to the technologies and platforms to be integrated. Steps and actions needed to manage the data integration are pre-built and configured within the KMs. The Oracle Data Integrator Application Adapter for Hadoop provides a series of KMs, specifically designed to integrate with Big Data Technologies. The Big Data KMs include: Check Knowledge Module Reverse Engineer Knowledge Module Hive Transform Knowledge Module Hive Control Append Knowledge Module File to Hive (LOAD DATA) Knowledge Module File-Hive to Oracle (OLH-OSCH) Knowledge Module Nothing to beat an Example: To demonstrate the use of the KMs which are part of the ODI Application Adapter for Hadoop, a mapping may be defined to move data between files and Hive targets. The mapping is defined by dragging the source and target into the mapping, performing the attribute (column) mapping (see Figure 1) and then selecting the KM which will govern the process. In this mapping example, movie data is being moved from an HDFS source into a Hive table. Some of the attributes, such as “CUSTID to custid”, have been mapped over. Figure 1 Defining the Mapping Before the proper KM can be assigned to define the technology for the mapping, it needs to be added to the ODI project. The Big Data KMs have been made available to the project through the KM import process. Generally, this is done prior to defining the mapping. Figure 2 Importing the Big Data Knowledge Modules Following the import, the KMs are available in the Designer Navigator. v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} Normal 0 false false false EN-US ZH-TW X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} Figure 3 The Project View in Designer, Showing Installed IKMs Once the KM is imported, it may be assigned to the mapping target. This is done by selecting the Physical View of the mapping and examining the Properties of the Target. In this case MOVIAPP_LOG_STAGE is the target of our mapping. Figure 4 Physical View of the Mapping and Assigning the Big Data Knowledge Module to the Target Alternative KMs may have been selected as well, providing flexibility and abstracting the logical mapping from the physical implementation. Our mapping may be applied to other technologies as well. The mapping is now complete and is ready to run. We will see more in a future blog about running a mapping to load Hive. To complete the quick ODI for Big Data Overview, let us take a closer look at what the IKM File to Hive is doing for us. ODI provides differentiated capabilities by defining the process and steps which normally would have to be manually developed, tested and implemented into the KM. As shown in figure 5, the KM is preparing the Hive session, managing the Hive tables, performing the initial load from HDFS and then performing the insert into Hive. HDFS and Hive options are selected graphically, as shown in the properties in Figure 4. Figure 5 Process and Steps Managed by the KM What’s Next Big Data being the shape shifting business challenge it is is fast evolving into the deciding factor between market leaders and others. Now that an introduction to ODI and Big Data has been provided, look for additional blogs coming soon using the Knowledge Modules which make up the Oracle Data Integrator Application Adapter for Hadoop: Importing Big Data Metadata into ODI, Testing Data Stores and Loading Hive Targets Generating Transformations using Hive Query language Loading Oracle from Hadoop Sources For more information now, please visit the Oracle Data Integrator Application Adapter for Hadoop web site, http://www.oracle.com/us/products/middleware/data-integration/hadoop/overview/index.html Do not forget to tune in to the ODI12c Executive Launch webcast on the 12th to hear more about ODI12c and GG12c. Normal 0 false false false EN-US ZH-TW X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";}

Read the article
JavaScript Data Binding Frameworks

- by dwahlin

Data binding is where it’s at now days when it comes to building client-centric Web applications. Developers experienced with desktop frameworks like WPF or web frameworks like ASP.NET, Silverlight, or others are used to being able to take model objects containing data and bind them to UI controls quickly and easily. When moving to client-side Web development the data binding story hasn’t been great since neither HTML nor JavaScript natively support data binding. This means that you have to write code to place data in a control and write code to extract it. Although it’s certainly feasible to do it from scratch (many of us have done it this way for years), it’s definitely tedious and not exactly the best solution when it comes to maintenance and re-use. Over the last few years several different script libraries have been released to simply the process of binding data to HTML controls. In fact, the subject of data binding is becoming so popular that it seems like a new script library is being released nearly every week. Many of the libraries provide MVC/MVVM pattern support in client-side JavaScript apps and some even integrate directly with server frameworks like Node.js. Here’s a quick list of a few of the available libraries that support data binding (if you like any others please add a comment and I’ll try to keep the list updated): AngularJS MVC framework for data binding (although closely follows the MVVM pattern). Backbone.js MVC framework with support for models, key/value binding, custom events, and more. Derby Provides a real-time environment that runs in the browser an in Node.js. The library supports data binding and templates. Ember Provides support for templates that automatically update as data changes. JsViews Data binding framework that provides “interactive data-driven views built on top of JsRender templates”. jQXB Expression Binder Lightweight jQuery plugin that supports bi-directional data binding support. KnockoutJS MVVM framework with robust support for data binding. For an excellent look at using KnockoutJS check out John Papa’s course on Pluralsight. Meteor End to end framework that uses Node.js on the server and provides support for data binding on the client. Simpli5 JavaScript framework that provides support for two-way data binding. WinRT with HTML5/JavaScript If you’re building Windows 8 applications using HTML5 and JavaScript there’s built-in support for data binding in the WinJS library. I won’t have time to write about each of these frameworks, but in the next post I’m going to talk about my (current) favorite when it comes to client-side JavaScript data binding libraries which is AngularJS. AngularJS provides an extremely clean way – in my opinion - to extend HTML syntax to support data binding while keeping model objects (the objects that hold the data) free from custom framework method calls or other weirdness. While I’m writing up the next post, feel free to visit the AngularJS developer guide if you’d like additional details about the API and want to get started using it.

Read the article
Protect Data and Save Money? Learn How Best-in-Class Organizations do Both

- by roxana.bradescu

Databases contain nearly two-thirds of the sensitive information that must be protected as part of any organization's overall approach to security, risk management, and compliance. Solutions for protecting data housed in databases vary from encrypting data at the application level to defense-in-depth protection of the database itself. So is there a difference? Absolutely! According to new research from the Aberdeen Group, Best-in-Class organizations experience fewer data breaches and audit deficiencies - at lower cost -- by deploying database security solutions. And the results are dramatic: Aberdeen found that organizations encrypting data within their databases achieved 30% fewer data breaches and 15% greater audit efficiency with 34% less total cost when compared to organizations encrypting data within applications. Join us for a live webcast with Derek Brink, Vice President and Research Fellow at the Aberdeen Group, next week to learn how your organization can become Best-in-Class.

Read the article
Bad Data is Really the Monster

- by Dain C. Hansen

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} Bad Data is really the monster – is an article written by Bikram Sinha who I borrowed the title and the inspiration for this blog. Sinha writes: “Bad or missing data makes application systems fail when they process order-level data. One of the key items in the supply-chain industry is the product (aka SKU). Therefore, it becomes the most important data element to tie up multiple merchandising processes including purchase order allocation, stock movement, shipping notifications, and inventory details… Bad data can cause huge operational failures and cost millions of dollars in terms of time, resources, and money to clean up and validate data across multiple participating systems. Yes bad data really is the monster, so what do we do about it? Close our eyes and hope it stays in the closet? We’ve tacked this problem for some years now at Oracle, and with our latest introduction of Oracle Enterprise Data Quality along with our integrated Oracle Master Data Management products provides a complete, best-in-class answer to the bad data monster. What’s unique about it? Oracle Enterprise Data Quality also combines powerful data profiling, cleansing, matching, and monitoring capabilities while offering unparalleled ease of use. What makes it unique is that it has dedicated capabilities to address the distinct challenges of both customer and product data quality – [different monsters have different needs of course!]. And the ability to profile data is just as important to identify and measure poor quality data and identify new rules and requirements. Included are semantic and pattern-based recognition to accurately parse and standardize data that is poorly structured. Finally all of the data quality components are integrated with Oracle Master Data Management, including Oracle Customer Hub and Oracle Product Hub, as well as Oracle Data Integrator Enterprise Edition and Oracle CRM. Want to learn more? On Tuesday Nov 15th, I invite you to listen to our webcast on Reduce ERP consolidation risks with Oracle Master Data Management I’ll be joined by our partner iGate Patni and be talking about one specific way to deal with the bad data monster specifically around ERP consolidation. Look forward to seeing you there!

Read the article
Where can I locate business data to use in my application?

- by Aaron McIver

This question talks about any and all free public raw data which appeared to have valuable pieces but nothing that really provides what I am looking for. Instead of using a socially defined listing of businesses (foursquare), I would like a business listing data set of registered businesses and associated addresses that could then be searchable based on location (coordinates). The critical need is that the data set should be filterable based on varying criteria (give me all restaurants, coffee shops, etc...). If the data is free that is great but anywhere that sells this type of data would also suffice. Infochimps looked like a possibility but perhaps something a bit more extensive exists. Where can I find a free or for fee data set of registered business that is filterable based on type of business and location?

Read the article
FreeBSD performance tuning. Sysctls, loader.conf, kernel

- by SaveTheRbtz

I wanted to share knowledge of tuning FreeBSD via sysctl.conf/loader.conf/KENCONF. It was initially based on Igor Sysoev's (author of nginx) presentation about FreeBSD tuning up to 100,000-200,000 active connections. Tunings are for FreeBSD-CURRENT. Since 7.2 amd64 some of them are tuned well by default. Prior 7.0 some of them are boot only (set via /boot/loader.conf) or does not exist at all. sysctl.conf: # No zero mapping feature # May break wine # (There are also reports about broken samba3) #security.bsd.map_at_zero=0 # If you have really busy webserver with apache13 you may run out of processes #kern.maxproc=10000 # Same for servers with apache2 / Pound #kern.threads.max_threads_per_proc=4096 # Max. backlog size kern.ipc.somaxconn=4096 # Shared memory // 7.2+ can use shared memory > 2Gb kern.ipc.shmmax=2147483648 # Sockets kern.ipc.maxsockets=204800 # Can cause this on older kernels: # http://old.nabble.com/Significant-performance-regression-for-increased-maxsockbuf-on-8.0-RELEASE-tt26745981.html#a26745981 ) kern.ipc.maxsockbuf=10485760 # Mbuf 2k clusters (on amd64 7.2+ 25600 is default) # For such high value vm.kmem_size must be increased to 3G kern.ipc.nmbclusters=262144 # Jumbo pagesize(_SC_PAGESIZE) clusters # Used as general packet storage for jumbo frames # can be monitored via `netstat -m` #kern.ipc.nmbjumbop=262144 # Jumbo 9k/16k clusters # If you are using them #kern.ipc.nmbjumbo9=65536 #kern.ipc.nmbjumbo16=32768 # For lower latency you can decrease scheduler's maximum time slice # default: stathz/10 (~ 13) #kern.sched.slice=1 # Increase max command-line length showed in `ps` (e.g for Tomcat/Java) # Default is PAGE_SIZE / 16 or 256 on x86 # This avoids commands to be presented as [executable] in `ps` # For more info see: http://www.freebsd.org/cgi/query-pr.cgi?pr=120749 kern.ps_arg_cache_limit=4096 # Every socket is a file, so increase them kern.maxfiles=204800 kern.maxfilesperproc=200000 kern.maxvnodes=200000 # On some systems HPET is almost 2 times faster than default ACPI-fast # Useful on systems with lots of clock_gettime / gettimeofday calls # See http://old.nabble.com/ACPI-fast-default-timecounter,-but-HPET-83--faster-td23248172.html # After revision 222222 HPET became default: http://svnweb.freebsd.org/base?view=revision&revision=222222 kern.timecounter.hardware=HPET # Small receive space, only usable on http-server, on file server this # should be increased to 65535 or even more #net.inet.tcp.recvspace=8192 # This is useful on Fat-Long-Pipes #net.inet.tcp.recvbuf_max=10485760 #net.inet.tcp.recvbuf_inc=65535 # Small send space is useful for http servers that serve small files # Autotuned since 7.x net.inet.tcp.sendspace=16384 # This is useful on Fat-Long-Pipes #net.inet.tcp.sendbuf_max=10485760 #net.inet.tcp.sendbuf_inc=65535 # Turn off receive autotuning # You can play with it. #net.inet.tcp.recvbuf_auto=0 #net.inet.tcp.sendbuf_auto=0 # This should be enabled if you going to use big spaces (>64k) # Also timestamp field is useful when using syncookies net.inet.tcp.rfc1323=1 # Turn this off on high-speed, lossless connections (LAN 1Gbit+) # If you set it there is no need in TCP_NODELAY sockopt (see man tcp) net.inet.tcp.delayed_ack=0 # This feature is useful if you are serving data over modems, Gigabit Ethernet, # or even high speed WAN links (or any other link with a high bandwidth delay product), # especially if you are also using window scaling or have configured a large send window. # Automatically disables on small RTT ( http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_subr.c?#rev1.237 ) # This sysctl was removed in 10-CURRENT: # See: http://www.mail-archive.com/[email protected]/msg06178.html #net.inet.tcp.inflight.enable=0 # TCP slowstart algorithm tunings # We assuming we have very fast clients #net.inet.tcp.slowstart_flightsize=100 #net.inet.tcp.local_slowstart_flightsize=100 # Disable randomizing of ports to avoid false RST # Before usage check SA here www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf # (it's also says that port randomization auto-disables at some conn.rates, but I didn't checked it thou) #net.inet.ip.portrange.randomized=0 # Increase portrange # For outgoing connections only. Good for seed-boxes and ftp servers. net.inet.ip.portrange.first=1024 net.inet.ip.portrange.last=65535 # # stops route cache degregation during a high-bandwidth flood # http://www.freebsd.org/doc/en/books/handbook/securing-freebsd.html #net.inet.ip.rtexpire=2 net.inet.ip.rtminexpire=2 net.inet.ip.rtmaxcache=1024 # Security net.inet.ip.redirect=0 net.inet.ip.sourceroute=0 net.inet.ip.accept_sourceroute=0 net.inet.icmp.maskrepl=0 net.inet.icmp.log_redirect=0 net.inet.icmp.drop_redirect=1 net.inet.tcp.drop_synfin=1 # # There is also good example of sysctl.conf with comments: # http://www.thern.org/projects/sysctl.conf # # icmp may NOT rst, helpful for those pesky spoofed # icmp/udp floods that end up taking up your outgoing # bandwidth/ifqueue due to all that outgoing RST traffic. # #net.inet.tcp.icmp_may_rst=0 # Security net.inet.udp.blackhole=1 net.inet.tcp.blackhole=2 # IPv6 Security # For more info see http://www.fosslc.org/drupal/content/security-implications-ipv6 # Disable Node info replies # To see this vulnerability in action run `ping6 -a sglAac ::1` or `ping6 -w ::1` on unprotected node net.inet6.icmp6.nodeinfo=0 # Turn on IPv6 privacy extensions # For more info see proposal http://unix.derkeiler.com/Mailing-Lists/FreeBSD/net/2008-06/msg00103.html net.inet6.ip6.use_tempaddr=1 net.inet6.ip6.prefer_tempaddr=1 # Disable ICMP redirect net.inet6.icmp6.rediraccept=0 # Disable acceptation of RA and auto linklocal generation if you don't use them #net.inet6.ip6.accept_rtadv=0 #net.inet6.ip6.auto_linklocal=0 # Increases default TTL, sometimes useful # Default is 64 net.inet.ip.ttl=128 # Lessen max segment life to conserve resources # ACK waiting time in miliseconds # (default: 30000. RFC from 1979 recommends 120000) net.inet.tcp.msl=5000 # Max bumber of timewait sockets net.inet.tcp.maxtcptw=200000 # Don't use tw on local connections # As of 15 Apr 2009. Igor Sysoev says that nolocaltimewait has some buggy realization. # So disable it or now till get fixed #net.inet.tcp.nolocaltimewait=1 # FIN_WAIT_2 state fast recycle net.inet.tcp.fast_finwait2_recycle=1 # Time before tcp keepalive probe is sent # default is 2 hours (7200000) #net.inet.tcp.keepidle=60000 # Should be increased until net.inet.ip.intr_queue_drops is zero net.inet.ip.intr_queue_maxlen=4096 # Interrupt handling via multiple CPU, but with context switch. # You can play with it. Default is 1; #net.isr.direct=0 # This is for routers only #net.inet.ip.forwarding=1 #net.inet.ip.fastforwarding=1 # This speed ups dummynet when channel isn't saturated net.inet.ip.dummynet.io_fast=1 # Increase dummynet(4) hash #net.inet.ip.dummynet.hash_size=2048 #net.inet.ip.dummynet.max_chain_len # Should be increased when you have A LOT of files on server # (Increase until vfs.ufs.dirhash_mem becomes lower) vfs.ufs.dirhash_maxmem=67108864 # Note from commit http://svn.freebsd.org/base/head@211031 : # For systems with RAID volumes and/or virtualization envirnments, where # read performance is very important, increasing this sysctl tunable to 32 # or even more will demonstratively yield additional performance benefits. vfs.read_max=32 # Explicit Congestion Notification (see http://en.wikipedia.org/wiki/Explicit_Congestion_Notification) net.inet.tcp.ecn.enable=1 # Flowtable - flow caching mechanism # Useful for routers #net.inet.flowtable.enable=1 #net.inet.flowtable.nmbflows=65535 # Extreme polling tuning #kern.polling.burst_max=1000 #kern.polling.each_burst=1000 #kern.polling.reg_frac=100 #kern.polling.user_frac=1 #kern.polling.idle_poll=0 # IPFW dynamic rules and timeouts tuning # Increase dyn_buckets till net.inet.ip.fw.curr_dyn_buckets is lower net.inet.ip.fw.dyn_buckets=65536 net.inet.ip.fw.dyn_max=65536 net.inet.ip.fw.dyn_ack_lifetime=120 net.inet.ip.fw.dyn_syn_lifetime=10 net.inet.ip.fw.dyn_fin_lifetime=2 net.inet.ip.fw.dyn_short_lifetime=10 # Make packets pass firewall only once when using dummynet # i.e. packets going thru pipe are passing out from firewall with accept #net.inet.ip.fw.one_pass=1 # shm_use_phys Wires all shared pages, making them unswappable # Use this to lessen Virtual Memory Manager's work when using Shared Mem. # Useful for databases #kern.ipc.shm_use_phys=1 # ZFS # Enable prefetch. Useful for sequential load type i.e fileserver. # FreeBSD sets vfs.zfs.prefetch_disable to 1 on any i386 systems and # on any amd64 systems with less than 4GB of avaiable memory # For additional info check this nabble thread http://old.nabble.com/Samba-read-speed-performance-tuning-td27964534.html #vfs.zfs.prefetch_disable=0 # On highload servers you may notice following message in dmesg: # "Approaching the limit on PV entries, consider increasing either the # vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable" vm.pmap.shpgperproc=2048 loader.conf: # Accept filters for data, http and DNS requests # Useful when your software uses select() instead of kevent/kqueue or when you under DDoS # DNS accf available on 8.0+ accf_data_load="YES" accf_http_load="YES" accf_dns_load="YES" # Async IO system calls aio_load="YES" # Linux specific devices in /dev # As for 8.1 it only /dev/full #lindev_load="YES" # Adds NCQ support in FreeBSD # WARNING! all ad[0-9]+ devices will be renamed to ada[0-9]+ # 8.0+ only #ahci_load="YES" #siis_load="YES" # FreeBSD 8.2+ # New Congestion Control for FreeBSD # http://caia.swin.edu.au/urp/newtcp/tools/cc_chd-readme-0.1.txt # http://www.ietf.org/proceedings/78/slides/iccrg-5.pdf # Initial merge commit message http://www.mail-archive.com/[email protected]/msg31410.html #cc_chd_load="YES" # Increase kernel memory size to 3G. # # Use ONLY if you have KVA_PAGES in kernel configuration, and you have more than 3G RAM # Otherwise panic will happen on next reboot! # # It's required for high buffer sizes: kern.ipc.nmbjumbop, kern.ipc.nmbclusters, etc # Useful on highload stateful firewalls, proxies or ZFS fileservers # (FreeBSD 7.2+ amd64 users: Check that current value is lower!) #vm.kmem_size="3G" # If your server has lots of swap (>4Gb) you should increase following value # according to http://lists.freebsd.org/pipermail/freebsd-hackers/2009-October/029616.html # Otherwise you'll be getting errors # "kernel: swap zone exhausted, increase kern.maxswzone" # kern.maxswzone="256M" # Older versions of FreeBSD can't tune maxfiles on the fly #kern.maxfiles="200000" # Useful for databases # Sets maximum data size to 1G # (FreeBSD 7.2+ amd64 users: Check that current value is lower!) #kern.maxdsiz="1G" # Maximum buffer size(vfs.maxbufspace) # You can check current one via vfs.bufspace # Should be lowered/upped depending on server's load-type # Usually decreased to preserve kmem # (default is 10% of mem) #kern.maxbcache="512M" # Sendfile buffers # For i386 only #kern.ipc.nsfbufs=10240 # FreeBSD 9+ # HPET "legacy route" support. It should allow HPET to work per-CPU # See http://www.mail-archive.com/[email protected]/msg03603.html #hint.atrtc.0.clock=0 #hint.attimer.0.clock=0 #hint.hpet.0.legacy_route=1 # syncache Hash table tuning net.inet.tcp.syncache.hashsize=1024 net.inet.tcp.syncache.bucketlimit=512 net.inet.tcp.syncache.cachelimit=65536 # Increased hostcache # Later host cache can be viewed via net.inet.tcp.hostcache.list hidden sysctl # Very useful for it's RTT RTTVAR # Must be power of two net.inet.tcp.hostcache.hashsize=65536 # hashsize * bucketlimit (which is 30 by default) # It allocates 255Mb (1966080*136) of RAM net.inet.tcp.hostcache.cachelimit=1966080 # TCP control-block Hash table tuning net.inet.tcp.tcbhashsize=4096 # Disable ipfw deny all # Should be uncommented when there is a chance that # kernel and ipfw binary may be out-of sync on next reboot #net.inet.ip.fw.default_to_accept=1 # # SIFTR (Statistical Information For TCP Research) is a kernel module that # logs a range of statistics on active TCP connections to a log file. # See prerelease notes http://groups.google.com/group/mailing.freebsd.current/browse_thread/thread/b4c18be6cdce76e4 # and man 4 sitfr #siftr_load="YES" # Enable superpages, for 7.2+ only # Also read http://lists.freebsd.org/pipermail/freebsd-hackers/2009-November/030094.html vm.pmap.pg_ps_enabled=1 # Usefull if you are using Intel-Gigabit NIC #hw.em.rxd=4096 #hw.em.txd=4096 #hw.em.rx_process_limit="-1" # Also if you have ALOT interrupts on NIC - play with following parameters # NOTE: You should set them for every NIC #dev.em.0.rx_int_delay: 250 #dev.em.0.tx_int_delay: 250 #dev.em.0.rx_abs_int_delay: 250 #dev.em.0.tx_abs_int_delay: 250 # There is also multithreaded version of em/igb drivers can be found here: # http://people.yandex-team.ru/~wawa/ # # for additional em monitoring and statistics use # sysctl dev.em.0.stats=1 ; dmesg # sysctl dev.em.0.debug=1 ; dmesg # Also after r209242 (-CURRENT) there is a separate sysctl for each stat variable; # Same tunings for igb #hw.igb.rxd=4096 #hw.igb.txd=4096 #hw.igb.rx_process_limit=100 # Some useful netisr tunables. See sysctl net.isr #net.isr.maxthreads=4 #net.isr.defaultqlimit=4096 #net.isr.maxqlimit: 10240 # Bind netisr threads to CPUs #net.isr.bindthreads=1 # # FreeBSD 9.x+ # Increase interface send queue length # See commit message http://svn.freebsd.org/viewvc/base?view=revision&revision=207554 #net.link.ifqmaxlen=1024 # Nicer boot logo =) loader_logo="beastie" And finally here is KERNCONF: # Just some of them, see also # cat /sys/{i386,amd64,}/conf/NOTES # This one useful only on i386 #options KVA_PAGES=512 # You can play with HZ in environments with high interrupt rate (default is 1000) # 100 is for my notebook to prolong it's battery life #options HZ=100 # Polling is goot on network loads with high packet rates and low-end NICs # NB! Do not enable it if you want more than one netisr thread #options DEVICE_POLLING # Eliminate datacopy on socket read-write # To take advantage with zero copy sockets you should have an MTU >= 4k # This req. is only for receiving data. # Read more in man zero_copy_sockets # Also this epic thread on kernel trap: # http://kerneltrap.org/node/6506 # Here Linus says that "anybody that does it that way (FreeBSD) is totally incompetent" #options ZERO_COPY_SOCKETS # Support TCP sign. Used for IPSec options TCP_SIGNATURE # There was stackoverflow found in KAME IPSec stack: # See http://secunia.com/advisories/43995/ # For quick workaround you can use `ipfw add deny proto ipcomp` options IPSEC # This ones can be loaded as modules. They described in loader.conf section #options ACCEPT_FILTER_DATA #options ACCEPT_FILTER_HTTP # Adding ipfw, also can be loaded as modules options IPFIREWALL # On 8.1+ you can disable verbose to see blocked packets on ipfw0 interface. # Also there is no point in compiling verbose into the kernel, because # now there is net.inet.ip.fw.verbose tunable. #options IPFIREWALL_VERBOSE #options IPFIREWALL_VERBOSE_LIMIT=10 options IPFIREWALL_FORWARD # Adding kernel NAT options IPFIREWALL_NAT options LIBALIAS # Traffic shaping options DUMMYNET # Divert, i.e. for userspace NAT options IPDIVERT # This is for OpenBSD's pf firewall device pf device pflog # pf's QoS - ALTQ options ALTQ options ALTQ_CBQ # Class Bases Queuing (CBQ) options ALTQ_RED # Random Early Detection (RED) options ALTQ_RIO # RED In/Out options ALTQ_HFSC # Hierarchical Packet Scheduler (HFSC) options ALTQ_PRIQ # Priority Queuing (PRIQ) options ALTQ_NOPCC # Required for SMP build # Pretty console # Manual can be found here http://forums.freebsd.org/showthread.php?t=6134 #options VESA #options SC_PIXEL_MODE # Disable reboot on Ctrl Alt Del #options SC_DISABLE_REBOOT # Change normal|kernel messages color options SC_NORM_ATTR=(FG_GREEN|BG_BLACK) options SC_KERNEL_CONS_ATTR=(FG_YELLOW|BG_BLACK) # More scroll space options SC_HISTORY_SIZE=8192 # Adding hardware crypto device device crypto device cryptodev # Useful network interfaces device vlan device tap #Virtual Ethernet driver device gre #IP over IP tunneling device if_bridge #Bridge interface device pfsync #synchronization interface for PF device carp #Common Address Redundancy Protocol device enc #IPsec interface device lagg #Link aggregation interface device stf #IPv4-IPv6 port # Also for my notebook, but may be used with Opteron device amdtemp # Same for Intel processors device coretemp # man 4 cpuctl device cpuctl # CPU control pseudo-device # Support for ECMP. More than one route for destination # Works even with default route so one can use it as LB for two ISP # For now code is unstable and panics (panic: rtfree 2) on route deletions. #options RADIX_MPATH # Multicast routing #options MROUTING #options PIM # Debug & DTrace options KDB # Kernel debugger related code options KDB_TRACE # Print a stack trace for a panic options KDTRACE_FRAME # amd64-only(?) options KDTRACE_HOOKS # all architectures - enable general DTrace hooks #options DDB #options DDB_CTF # all architectures - kernel ELF linker loads CTF data # Adaptive spining in lockmgr (8.x+) # See http://www.mail-archive.com/[email protected]/msg10782.html options ADAPTIVE_LOCKMGRS # UTF-8 in console (8.x+) #options TEKEN_UTF8 # FreeBSD 8.1+ # Deadlock resolver thread # For additional information see http://www.mail-archive.com/[email protected]/msg18124.html # (FYI: "resolution" is panic so use with caution) #options DEADLKRES # Increase maximum size of Raw I/O and sendfile(2) readahead #options MAXPHYS=(1024*1024) #options MAXBSIZE=(1024*1024) # For scheduler debug enable following option. # Debug will be available via `kern.sched.stats` sysctl # For more information see http://svnweb.freebsd.org/base/head/sys/conf/NOTES?view=markup #options SCHED_STATS If you are tuning network for maximum performance you may wish to play with ifconfig options like: # You can list all capabilities via `ifconfig -m` ifconfig [-]rxcsum [-]txcsum [-]tso [-]lro mtu In case you've enabled DDB in kernel config, you should edit your /etc/ddb.conf and add something like this to enable automatic reboot (and textdump as bonus): script kdb.enter.panic=textdump set; capture on; show pcpu; bt; ps; alltrace; capture off; call doadump; reset script kdb.enter.default=textdump set; capture on; bt; ps; capture off; call doadump; reset And do not forget to add ddb_enable="YES" to /etc/rc.conf Since FreeBSD 9 you can select to enable/disable flowcontrol on your NIC: # See http://en.wikipedia.org/wiki/Ethernet_flow_control and # http://www.mail-archive.com/[email protected]/msg07927.html for additional info ifconfig bge0 media auto mediaopt flowcontrol PS. Also most of FreeBSD's limits can be monitored by # vmstat -z and # limits PPS. variety of network counters can be monitored via # netstat -s In FreeBSD-9 netstat's -Q option appeared, try following command to display netisr stats # netstat -Q PPPS. also see # man 7 tuning PPPPS. I wanted to thank FreeBSD community, especially author of nginx - Igor Sysoev, nginx-ru@ and FreeBSD-performance@ mailing lists for providing useful information about FreeBSD tuning. FreeBSD WIP * Whats cooking for FreeBSD 7? * Whats cooking for FreeBSD 8? * Whats cooking for FreeBSD 9? So here is the question: What tunings are you using on yours FreeBSD servers? You can also post your /etc/sysctl.conf, /boot/loader.conf, kernel options, etc with description of its' meaning (do not copy-paste from sysctl -d). Don't forget to specify server type (web, smb, gateway, etc) Let's share experience!

Read the article
Sybase PowerDesigner Change Many (Find/Replace/Convert) Data Item's Data Types

- by Andy

Hello, I have a relatively large Conceptual Data Model in PowerDesigner. After generating a Physical Data Model and seeing the DBMS data types, I need to update all of data types(NUMBER/TEXT) for each data item. I'd like to either do a find/replace within the Conceptual Data Model or somehow map to different data types when creating the Physical Data Model. Ex. Change the auto conversion of Text - Clob, to Text - NVARCHAR(20). Thanks!

Read the article
are there any useful datasets available on the web for data mining?

- by niko

Hi, Does anyone know any good resource where example (real) data can be downloaded for experimenting statistics and machine learning techniques such as decision trees etc? Currently I am studying machine learning techniques and it would be very helpful to have real data for evaluating the accuracy of various tools. If anyone knows any good resource (perhaps csv, xls files or any other format) I would be very thankful for a suggestion.

Read the article
Big Data: Size isn’t everything

- by Simon Elliston Ball

Big Data has a big problem; it’s the word “Big”. These days, a quick Google search will uncover terabytes of negative opinion about the futility of relying on huge volumes of data to produce magical, meaningful insight. There are also many clichéd but correct assertions about the difficulties of correlation versus causation, in massive data sets. In reading some of these pieces, I begin to understand how climatologists must feel when people complain ironically about “global warming” during snowfall. Big Data has a name problem. There is a lot more to it than size. Shape, Speed, and…err…Veracity are also key elements (now I understand why Gartner and the gang went with V’s instead of S’s). The need to handle data of different shapes (Variety) is not new. Data developers have always had to mold strange-shaped data into our reporting systems, integrating with semi-structured sources, and even straying into full-text searching. However, what we lacked was an easy way to add semi-structured and unstructured data to our arsenal. New “Big Data” tools such as MongoDB, and other NoSQL (Not Only SQL) databases, or a graph database like Neo4J, fill this gap. Still, to many, they simply introduce noise to the clean signal that is their sensibly normalized data structures. What about speed (Velocity)? It’s not just high frequency trading that generates data faster than a single system can handle. Many other applications need to make trade-offs that traditional databases won’t, in order to cope with high data insert speeds, or to extract quickly the required information from data streams. Unfortunately, many people equate Big Data with the Hadoop platform, whose batch driven queries and job processing queues have little to do with “velocity”. StreamInsight, Esper and Tibco BusinessEvents are examples of Big Data tools designed to handle high-velocity data streams. Again, the name doesn’t do the discipline of Big Data any favors. Ultimately, though, does analyzing fast moving data produce insights as useful as the ones we get through a more considered approach, enabled by traditional BI? Finally, we have Veracity and Value. In many ways, these additions to the classic Volume, Velocity and Variety trio acknowledge the criticism that without high-quality data and genuinely valuable outputs then data, big or otherwise, is worthless. As a discipline, Big Data has recognized this, and data quality and cleaning tools are starting to appear to support it. Rather than simply decrying the irrelevance of Volume, we need as a profession to focus how to improve Veracity and Value. Perhaps we should just declare the ‘Big’ silent, embrace these new data tools and help develop better practices for their use, just as we did the good old RDBMS? What does Big Data mean to you? Which V gives your business the most pain, or the most value? Do you see these new tools as a useful addition to the BI toolbox, or are they just enabling a dangerous trend to find ghosts in the noise?

Read the article
Know your Data Lineage

- by Simon Elliston Ball

An academic paper without the footnotes isn’t an academic paper. Journalists wouldn’t base a news article on facts that they can’t verify. So why would anyone publish reports without being able to say where the data has come from and be confident of its quality, in other words, without knowing its lineage. (sometimes referred to as ‘provenance’ or ‘pedigree’) The number and variety of data sources, both traditional and new, increases inexorably. Data comes clean or dirty, processed or raw, unimpeachable or entirely fabricated. On its journey to our report, from its source, the data can travel through a network of interconnected pipes, passing through numerous distinct systems, each managed by different people. At each point along the pipeline, it can be changed, filtered, aggregated and combined. When the data finally emerges, how can we be sure that it is right? How can we be certain that no part of the data collection was based on incorrect assumptions, that key data points haven’t been left out, or that the sources are good? Even when we’re using data science to give us an approximate or probable answer, we cannot have any confidence in the results without confidence in the data from which it came. You need to know what has been done to your data, where it came from, and who is responsible for each stage of the analysis. This information represents your data lineage; it is your stack-trace. If you’re an analyst, suspicious of a number, it tells you why the number is there and how it got there. If you’re a developer, working on a pipeline, it provides the context you need to track down the bug. If you’re a manager, or an auditor, it lets you know the right things are being done. Lineage tracking is part of good data governance. Most audit and lineage systems require you to buy into their whole structure. If you are using Hadoop for your data storage and processing, then tools like Falcon allow you to track lineage, as long as you are using Falcon to write and run the pipeline. It can mean learning a new way of running your jobs (or using some sort of proxy), and even a distinct way of writing your queries. Other Hadoop tools provide a lot of operational and audit information, spread throughout the many logs produced by Hive, Sqoop, MapReduce and all the various moving parts that make up the eco-system. To get a full picture of what’s going on in your Hadoop system you need to capture both Falcon lineage and the data-exhaust of other tools that Falcon can’t orchestrate. However, the problem is bigger even that that. Often, Hadoop is just one piece in a larger processing workflow. The next step of the challenge is how you bind together the lineage metadata describing what happened before and after Hadoop, where ‘after’ could be a data analysis environment like R, an application, or even directly into an end-user tool such as Tableau or Excel. One possibility is to push as much as you can of your key analytics into Hadoop, but would you give up the power, and familiarity of your existing tools in return for a reliable way of tracking lineage? Lineage and auditing should work consistently, automatically and quietly, allowing users to access their data with any tool they require to use. The real solution, therefore, is to create a consistent method by which to bring lineage data from these data various disparate sources into the data analysis platform that you use, rather than being forced to use the tool that manages the pipeline for the lineage and a different tool for the data analysis. The key is to keep your logs, keep your audit data, from every source, bring them together and use the data analysis tools to trace the paths from raw data to the answer that data analysis provides.

Read the article
How to maintain an ordered table with Core Data (or SQL) with insertions/deletions?

- by Jean-Denis Muys

This question is in the context of Core Data, but if I am not mistaken, it applies equally well to a more general SQL case. I want to maintain an ordered table using Core Data, with the possibility for the user to: reorder rows insert new lines anywhere delete any existing line What's the best data model to do that? I can see two ways: 1) Model it as an array: I add an int position property to my entity 2) Model it as a linked list: I add two one-to-one relations, next and previous from my entity to itself 1) makes it easy to sort, but painful to insert or delete as you then have to update the position of all objects that come after 2) makes it easy to insert or delete, but very difficult to sort. In fact, I don't think I know how to express a Sort Descriptor (SQL ORDER BY clause) for that case. Now I can imagine a variation on 1): 3) add an int ordering property to the entity, but instead of having it count one-by-one, have it count 100 by 100 (for example). Then inserting is as simple as finding any number between the ordering of the previous and next existing objects. The expensive renumbering only has to occur when the 100 holes have been filled. Making that property a float rather than an int makes it even better: it's almost always possible to find a new float midway between two floats. Am I on the right track with solution 3), or is there something smarter?

Read the article
NSURLConnection receives data even if no data was thrown back

- by Anna Fortuna

Let me explain my situation. Currently, I am experimenting long-polling using NSURLConnection. I found this and I decided to try it. What I do is send a request to the server with a timeout interval of 300 secs. (or 5 mins.) Here is a code snippet: NSURL *url = [NSURL URLWithString:urlString]; NSURLRequest *request = [NSURLRequest requestWithURL:url cachePolicy:NSURLCacheStorageAllowedInMemoryOnly timeoutInterval:300]; NSData *data = [NSURLConnection sendSynchronousRequest:request returningResponse:&resp error:&err]; Now I want to test if the connection will "hold" the request if no data was thrown back from the server, so what I did was this: if (data != nil) [self performSelectorOnMainThread:@selector(dataReceived:) withObject:data waitUntilDone:YES]; And the function dataReceived: looks like this: - (void)dataReceived:(NSData *)data { NSLog(@"DATA RECEIVED!"); NSString *string = [NSString stringWithUTF8String:[data bytes]]; NSLog(@"THE DATA: %@", string); } Server-side, I created a function that will return a data once it fits the arguments and returns none if nothing fits. Here is a snippet of the PHP function: function retrieveMessages($vardata) { if (!empty($vardata)) { $result = check_data($vardata) //check_data is the function which returns 1 if $vardata //fits the arguments, and 0 if it fails to fit if ($result == 1) { $jsonArray = array('Data' => $vardata); echo json_encode($jsonArray); } } } As you can see, the function will only return data if the $result is equal to 1. However, even if the function returns nothing, NSURLConnection will still perform the function dataReceived: meaning the NSURLConnection still receives data, albeit an empty one. So can anyone help me here? How will I perform long-polling using NSURLConnection? Basically, I want to maintain the connection as long as no data is returned. So how will I do it? NOTE: I am new to PHP, so if my code is wrong, please point it out so I can correct it.

Read the article
How can I scrape specific data from a website

- by Stoney

I'm trying to scrape data from a website for research. The urls are nicely organized in an example.com/x format, with x as an ascending number and all of the pages are structured in the same way. I just need to grab certain headings and a few numbers which are always in the same locations. I'll then need to get this data into structured form for analysis in Excel. I have used wget before to download pages, but I can't figure out how to grab specific lines of text. Excel has a feature to grab data from the web (Data-From Web) but from what I can see it only allows me to download tables. Unfortunately, the data I need is not in tables.

Read the article
How should I architect my Model and Data Access layer objects in my website?

- by Robin Winslow

I've been tasked with designing Data layer for a website at work, and I am very interested in architecture of code for the best flexibility, maintainability and readability. I am generally acutely aware of the value in completely separating out my actual Models from the Data Access layer, so that the Models are completely naive when it comes to Data Access. And in this case it's particularly useful to do this as the Models may be built from the Database or may be built from a Soap web service. So it seems to me to make sense to have Factories in my data access layer which create Model objects. So here's what I have so far (in my made-up pseudocode): class DataAccess.ProductsFromXml extends DataAccess.ProductFactory {} class DataAccess.ProductsFromDatabase extends DataAccess.ProductFactory {} These then get used in the controller in a fashion similar to the following: var xmlProductCreator = DataAccess.ProductsFromXml(xmlDataProvider); var databaseProductCreator = DataAccess.ProductsFromXml(xmlDataProvider); // Returns array of Product model objects var XmlProducts = databaseProductCreator.Products(); // Returns array of Product model objects var DbProducts = xmlProductCreator.Products(); So my question is, is this a good structure for my Data Access layer? Is it a good idea to use a Factory for building my Model objects from the data? Do you think I've misunderstood something? And are there any general patterns I should read up on for how to write my data access objects to create my Model objects?

Read the article
Windows 7 wont boot from any boot loader except for 'Windows Boot Manager' after partition resize

- by user2468327

I have a triple boot system on a single SSD. OSX, Windows 7, and Ubuntu. I use Chimera (basically another version of Chameleon) as my boot loader. Usually I can boot all 3 without any issue, but after using GParted to make my Ubuntu partition 2 Gigs larger, Windows 7 throws me an error when trying to boot to it from either Chimera or Grub. The error is consistently: 0xc000000e "cant find \Boot\BCD" (slightly paraphrased). However, I can still get into Windows by selecting "Windows Boot Manager" from the boot options in my bios. I've already tried several known fixes for similar issues, including bootrec /rebuildbcd (and variations), and BootRec.exe/fixMBR + BootRec.exe/fixBoot. Ive also tried Chkdsk. At best this has made it so Windows 7 boots on it's own by default (making me have to reinstall Chimera and change back my boot settings in the bios). At worst this made it so Windows wont boot period. Now I'm back full circle where I started. A detail that might be useful is that bootrec /rebuildbcd says that the number of found Windows installations is 0. How do I get it back so I can boot Win7 through another boot loader so I don't have to manually select it in the bios? Preferably without a reinstall.

Read the article
overwrite existing entity via bulkloader.Loader

- by Ray Yun

I was going to CSV based export/import for large data with app engine. My idea was just simple. First column of CSV would be key of entity. If it's not empty, that row means existing entity and should overwrite old one. Else, that row is new entity and should create new one. I could export key of entity by adding key property. class FrontExporter(bulkloader.Exporter): def __init__(self): bulkloader.Exporter.__init__(self, 'Front', [ ('__key__', str, None), ('name', str, None), ]) But when I was trying to upload CSV, it had failed because bulkloader.Loader.generate_key() was just for "key_name" not "key" itself. That means all exported entities in CSV should have unique 'key_name' if I want to modify-and-reupload them. class FrontLoader(bulkloader.Loader): def __init__(self): bulkloader.Loader.__init__(self, 'Front', [ ('_UNUSED', lambda x: None), ('name', lambda x: x.decode('utf-8')), ]) def generate_key(self,i,values): # first column is key keystr = values[0] if len(keystr)==0: return None return keystr I also tried to load key directly without using generate_key(), but both failed. class FrontLoader(bulkloader.Loader): def __init__(self): bulkloader.Loader.__init__(self, 'Front', [ ('Key', db.Key), # not working. just create new one. ('__key__', db.Key), # same... So, how can I overwrite existing entity which has no 'key_name'? It would be horrible if I should give unique name to all entities..... From the first answer, I could handle this problem. :) def create_entity(self, values, key_name=None, parent=None): # if key_name is None: # print 'key_name is None' # else: # print 'key_name=<',key_name,'> : length=',len(key_name) Validate(values, (list, tuple)) assert len(values) == len(self._Loader__properties), ( 'Expected %d columns, found %d.' % (len(self._Loader__properties), len(values))) model_class = GetImplementationClass(self.kind) properties = { 'key_name': key_name, 'parent': parent, } for (name, converter), val in zip(self._Loader__properties, values): if converter is bool and val.lower() in ('0', 'false', 'no'): val = False properties[name] = converter(val) if key_name is None: entity = model_class(**properties) #print 'create new one' else: entity = model_class.get(key_name) for key, value in properties.items(): setattr(entity, key, value) #print 'overwrite old one' entities = self.handle_entity(entity) if entities: if not isinstance(entities, (list, tuple)): entities = [entities] for entity in entities: if not isinstance(entity, db.Model): raise TypeError('Expected a db.Model, received %s (a %s).' % (entity, entity.__class__)) return entities def generate_key(self,i,values): # first column is key if values[0] is None or values[0] in ('',' ','-','.'): return None return values[0]

Read the article
Calculating percentiles in Excel with "buckets" data instead of the data list itself

- by G B

I have a bunch of data in Excel that I need to get certain percentile information from. The problem is that instead of having the data set made up of each value, I instead have info on the number of or "bucket" data. For example, imagine that my actual data set looks like this: 1,1,2,2,2,2,3,3,4,4,4 The data set that I have is this: Value No. of occurrences 1 2 2 4 3 2 4 3 Is there an easy way for me to calculate percentile information (as well as the median) without having to explode the summary data out to full data set? (Once I did that, I know that I could just use the Percentile(A1:A5, p) function) This is important because my data set is very large. If I exploded the data out, I would have hundreds of thousands of rows and I would have to do it for a couple of hundred data sets. Help!

Read the article
How to: Create a Data Service Using an ADO.NET Entity Framework Data Source (WCF Data Services)

- by meysam_pro

i read this but i can't find ADO.NET Data Service , why?

Read the article
How do I setup a WCF Data Service with an ADO.NET Entity Entity Model in another assembly?

- by lsb

Hi! I have an ASP.NET 4.0 website that has an Entity Data Model hooked up to WCF Data Service. When the Service and Model are in the same assembly everything works. Unfortunately, when I move the Model to another "shared" assembly (and change the namespace) the service compiles but throws a 500 error when launched in a browser. The reason I want to have the Model in a common assembly (lets call it RiaTest.Shared) is that I want share common validation code between the client and service (by checking "Reuse types in referenced assemblies" in the Advanced tab of the Add Service Reference dialog). Anyway, I've spent a couple of hours on this to no avail so any help in the regard would be appreciated...

Read the article
Open Data, Government and Transparency

- by Tori Wieldt

A new track at TDC (The Developer's Conference in Sao Paulo, Brazil) is titled Open Data. It deals with open data, government and transparency. Saturday will be a "transparency hacker day" where developers are invited to create applications using open data from the Brazilian government. Alexandre Gomes, co-lead of the track, says "I want to inspire developers to become "Civic hackers:" developers who create apps to make society better." It is a chance for developers to do well and do good. There are many opportunities for developers, including monitoring government expenditures and getting citizens involved via social networks. The open data movement is growing worldwide. One initiative, the Open Government Partnership, is working to make government data easier to find and access. Making this data easily available means that with the right applications, it will be easier for people to make decisions and suggestions about government policies based on detailed information. Last April, the Open Government Partnership held its annual meeting in Brasilia, the capitol of Brazil. It was a great success showcasing the innovative work being done in open data by governments, civil societies and individuals around the world. For example, Bulgaria now publishes daily data on budget spending for all public institutions. Alexandre Gomes Explains Open Data At TDC, the Open Data track will include a presentation of examples of successful open data projects, an introduction to the semantic web, how to handle big data sets, techniques of data visualization, and how to design APIs.The other track lead is Christian Moryah Miranda, a systems analyst for the Brazilian Government's Ministry of Planning. "The Brazilian government wholeheartedly supports this effort. In order to make our data available to the public, it forces us to be more consistent with our data across ministries, and that's a good step forward for us," he said. He explained the government knows they cannot achieve everything they would like without help from the public. "It is not the government versus the people, rather citizens are partners with the government, and together we can achieve great things!" Miranda exclaimed. Saturday at TDC will be a "transparency hacker day" where developers will be invited to create applications using open data from the Brazilian government. Attendees are invited to pitch their ideas, work in small groups, and present their project at the end of the conference. "For example," Gomes said, "the Brazilian government just released the salaries of all government employees and I can't wait to see what developers can do with that." Resources Open Government Partnership U.S. Government Open Data ProjectBrazilian Government Open Data ProjectU.K. Government Open Data Project 2012 International Open Government Data Conference

Read the article
Master Data Management and Cloud Computing

- by david.butler(at)oracle.com

Cloud Computing is all the rage these days. There are many reasons why this is so. But like its predecessor, Service Oriented Architecture, it can fall on hard times if the underlying data is left unmanaged. Master Data Management is the perfect Cloud companion. It can materially increase the chances for successful Cloud initiatives. In this blog, I'll review the nature of the Cloud and show how MDM fits in. Here's the National Institute of Standards and Technology Cloud definition: • Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. Cloud architectures have three main layers: applications or Software as a Service (SaaS), Platforms as a Service (PaaS), and Infrastructure as a Service (IaaS). SaaS generally refers to applications that are delivered to end-users over the Internet. Oracle CRM On Demand is an example of a SaaS application. Today there are hundreds of SaaS providers covering a wide variety of applications including Salesforce.com, Workday, and Netsuite. Oracle MDM applications are located in this layer of Oracle's On Demand enterprise Cloud platform. We call it Master Data as a Service (MDaaS). PaaS generally refers to an application deployment platform delivered as a service. They are often built on a grid computing architecture and include database and middleware. Oracle Fusion Middleware is in this category and includes the SOA and Data Integration products used to connect SaaS applications including MDM. Finally, IaaS generally refers to computing hardware (servers, storage and network) delivered as a service. This typically includes the associated software as well: operating systems, virtualization, clustering, etc. Cloud Computing benefits are compelling for a large number of organizations. These include significant cost savings, increased flexibility, and fast deployments. Cost advantages include paying for just what you use. This is especially critical for organizations with variable or seasonal usage. Companies don't have to invest to support peak computing periods. Costs are also more predictable and controllable. Increased agility includes access to the latest technology and experts without making significant up front investments. While Cloud Computing is certainly very alluring with a clear value proposition, it is not without its challenges. An IDC survey of 244 IT executives/CIOs and their line-of-business (LOB) colleagues identified a number of issues: Security - 74% identified security as an issue involving data privacy and resource access control. Integration - 61% found that it is hard to integrate Cloud Apps with in-house applications. Operational Costs - 50% are worried that On Demand will actually cost more given the impact of poor data quality on the rest of the enterprise. Compliance - 49% felt that compliance with required regulatory, legal and general industry requirements (such as PCI, HIPAA and Sarbanes-Oxley) would be a major issue. When control is lost, the ability of a provider to directly manage how and where data is deployed, used and destroyed is negatively impacted. There are others, but I singled out these four top issues because Master Data Management, properly incorporated into a Cloud Computing infrastructure, can significantly ameliorate all of these problems. Cloud Computing can literally rain raw data across the enterprise. According to fellow blogger, Mike Ferguson, "the fracturing of data caused by the adoption of cloud computing raises the importance of MDM in keeping disparate data synchronized." David Linthicum, CTO Blue Mountain Labs blogs that "the lack of MDM will become more of an issue as cloud computing rises. We're moving from complex federated on-premise systems, to complex federated on-premise and cloud-delivered systems." Left unmanaged, non-standard, inconsistent, ungoverned data with questionable quality can pollute analytical systems, increase operational costs, and reduce the ROI in Cloud and On-Premise applications. As cloud computing becomes more relevant, and more data, applications, services, and processes are moved out to cloud computing platforms, the need for MDM becomes ever more important. Oracle's MDM suite is designed to deal with all four of the above Cloud issues listed in the IDC survey. Security - MDM manages all master data attribute privacy and resource access control issues. Integration - MDM pre-integrates Cloud Apps with each other and with On Premise applications at the data level. Operational Costs - MDM significantly reduces operational costs by increasing data quality, thereby improving enterprise business processes efficiency. Compliance - MDM, with its built in Data Governance capabilities, insures that the data is governed according to organizational standards. This facilitates rapid and accurate reporting for compliance purposes. Oracle MDM creates governed high quality master data. A unified cleansed and standardized data view is produced. The Oracle Customer Hub creates a single view of the customer. The Oracle Product Hub creates high quality product data designed to support all go-to-market processes. Oracle Supplier Hub dramatically reduces the chances of 'supplier exceptions'. Oracle Site Hub masters locations. And Oracle Hyperion Data Relationship Management masters financial reference data and manages enterprise hierarchies across operational areas from ERP to EPM and CRM to SCM. Oracle Fusion Middleware connects Cloud and On Premise applications to MDM Hubs and brings high quality master data to your enterprise business processes. An independent analyst once said "Poor data quality is like dirt on the windshield. You may be able to drive for a long time with slowly degrading vision, but at some point, you either have to stop and clear the windshield or risk everything." Cloud Computing has the potential to significantly degrade data quality across the enterprise over time. Deploying a Master Data Management solution prior to or in conjunction with a move to the Cloud can insure that the data flowing into the enterprise from the Cloud is clean and governed. This will in turn insure that expected returns on the investment in Cloud Computing will be realized. Oracle MDM has proven its metal in this area and has the customers to back that up. In fact, I will be hosting a webcast on Tuesday, April 10th at 10 am PT with one of our top Cloud customers, the Church Pension Group. They have moved all mainline applications to a hosted model and use Oracle MDM to insure the master data is managed and cleansed before it is propagated to other cloud and internal systems. I invite you join Martin Hossfeld, VP, IT Operations, and Danette Patterson, Enterprise Data Manager as they review business drivers for MDM and hosted applications, how they did it, the benefits achieved, and lessons learned. You can register for this free webcast here. Hope to see you there.

Read the article

Search Results

Search found 59438 results on 2378 pages for 'data loader'.

Page 22/2378 | < Previous Page | 18 19 20 21 22 23 24 25 26 27 28 29 | Next Page >

- by Mala Narasimharajan

- by xybrek

- by user253656

- by rajbk

- by Madhu Nair

- by dwahlin

- by roxana.bradescu

- by Dain C. Hansen

- by Aaron McIver

- by SaveTheRbtz

- by Andy

- by niko

- by Simon Elliston Ball

- by Simon Elliston Ball

- by Jean-Denis Muys

- by Anna Fortuna

- by Stoney

- by Robin Winslow

- by user2468327

- by Ray Yun

- by G B

- by meysam_pro

- by lsb

- by Tori Wieldt

- by david.butler(at)oracle.com

< Previous Page | 18 19 20 21 22 23 24 25 26 27 28 29 | Next Page >