Search Results

Search found 59041 results on 2362 pages for 'data replication'.

Page 5/2362 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Benchmarking MySQL Replication with Multi-Threaded Slaves

- by Mat Keep

0 0 1 1145 6530 Homework 54 15 7660 14.0 Normal 0 false false false EN-US JA X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-ansi-language:EN-US;} The objective of this benchmark is to measure the performance improvement achieved when enabling the Multi-Threaded Slave enhancement delivered as a part MySQL 5.6. As the results demonstrate, Multi-Threaded Slaves delivers 5x higher replication performance based on a configuration with 10 databases/schemas. For real-world deployments, higher replication performance directly translates to: · Improved consistency of reads from slaves (i.e. reduced risk of reading "stale" data) · Reduced risk of data loss should the master fail before replicating all events in its binary log (binlog) The multi-threaded slave splits processing between worker threads based on schema, allowing updates to be applied in parallel, rather than sequentially. This delivers benefits to those workloads that isolate application data using databases - e.g. multi-tenant systems deployed in cloud environments. Multi-Threaded Slaves are just one of many enhancements to replication previewed as part of the MySQL 5.6 Development Release, which include: · Global Transaction Identifiers coupled with MySQL utilities for automatic failover / switchover and slave promotion · Crash Safe Slaves and Binlog · Optimized Row Based Replication · Replication Event Checksums · Time Delayed Replication These and many more are discussed in the “MySQL 5.6 Replication: Enabling the Next Generation of Web & Cloud Services” Developer Zone article Back to the benchmark - details are as follows. Environment The test environment consisted of two Linux servers: · one running the replication master · one running the replication slave. Only the slave was involved in the actual measurements, and was based on the following configuration: - Hardware: Oracle Sun Fire X4170 M2 Server - CPU: 2 sockets, 6 cores with hyper-threading, 2930 MHz. - OS: 64-bit Oracle Enterprise Linux 6.1 - Memory: 48 GB Test Procedure Initial Setup: Two MySQL servers were started on two different hosts, configured as replication master and slave. 10 sysbench schemas were created, each with a single table: CREATE TABLE `sbtest` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `k` int(10) unsigned NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `k` (`k`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 10,000 rows were inserted in each of the 10 tables, for a total of 100,000 rows. When the inserts had replicated to the slave, the slave threads were stopped. The slave data directory was copied to a backup location and the slave threads position in the master binlog noted. 10 sysbench clients, each configured with 10 threads, were spawned at the same time to generate a random schema load against each of the 10 schemas on the master. Each sysbench client executed 10,000 "update key" statements: UPDATE sbtest set k=k+1 WHERE id = <random row> In total, this generated 100,000 update statements to later replicate during the test itself. Test Methodology: The number of slave workers to test with was configured using: SET GLOBAL slave_parallel_workers=<workers> Then the slave IO thread was started and the test waited for all the update queries to be copied over to the relay log on the slave. The benchmark clock was started and then the slave SQL thread was started. The test waited for the slave SQL thread to finish executing the 100k update queries, doing "select master_pos_wait()". When master_pos_wait() returned, the benchmark clock was stopped and the duration calculated. The calculated duration from the benchmark clock should be close to the time it took for the SQL thread to execute the 100,000 update queries. The 100k queries divided by this duration gave the benchmark metric, reported as Queries Per Second (QPS). Test Reset: The test-reset cycle was implemented as follows: · the slave was stopped · the slave data directory replaced with the previous backup · the slave restarted with the slave threads replication pointer repositioned to the point before the update queries in the binlog. The test could then be repeated with identical set of queries but a different number of slave worker threads, enabling a fair comparison. The Test-Reset cycle was repeated 3 times for 0-24 number of workers and the QPS metric calculated and averaged for each worker count. MySQL Configuration The relevant configuration settings used for MySQL are as follows: binlog-format=STATEMENT relay-log-info-repository=TABLE master-info-repository=TABLE As described in the test procedure, the slave_parallel_workers setting was modified as part of the test logic. The consequence of changing this setting is: 0 worker threads: - current (i.e. single threaded) sequential mode - 1 x IO thread and 1 x SQL thread - SQL thread both reads and executes the events 1 worker thread: - sequential mode - 1 x IO thread, 1 x Coordinator SQL thread and 1 x Worker thread - coordinator reads the event and hands it to the worker who executes 2+ worker threads: - parallel execution - 1 x IO thread, 1 x Coordinator SQL thread and 2+ Worker threads - coordinator reads events and hands them to the workers who execute them Results Figure 1 below shows that Multi-Threaded Slaves deliver ~5x higher replication performance when configured with 10 worker threads, with the load evenly distributed across our 10 x schemas. This result is compared to the current replication implementation which is based on a single SQL thread only (i.e. zero worker threads). Figure 1: 5x Higher Performance with Multi-Threaded Slaves The following figure shows more detailed results, with QPS sampled and reported as the worker threads are incremented. The raw numbers behind this graph are reported in the Appendix section of this post. Figure 2: Detailed Results As the results above show, the configuration does not scale noticably from 5 to 9 worker threads. When configured with 10 worker threads however, scalability increases significantly. The conclusion therefore is that it is desirable to configure the same number of worker threads as schemas. Other conclusions from the results: · Running with 1 worker compared to zero workers just introduces overhead without the benefit of parallel execution. · As expected, having more workers than schemas adds no visible benefit. Aside from what is shown in the results above, testing also demonstrated that the following settings had a very positive effect on slave performance: relay-log-info-repository=TABLE master-info-repository=TABLE For 5+ workers, it was up to 2.3 times as fast to run with TABLE compared to FILE. Conclusion As the results demonstrate, Multi-Threaded Slaves deliver significant performance increases to MySQL replication when handling multiple schemas. This, and the other replication enhancements introduced in MySQL 5.6 are fully available for you to download and evaluate now from the MySQL Developer site (select Development Release tab). You can learn more about MySQL 5.6 from the documentation Please don’t hesitate to comment on this or other replication blogs with feedback and questions. Appendix – Detailed Results

Read the article
Move data from others user accounts in my user account

- by user118136

I had problems with compiz setting and I make multiple accounts, now I want to transfer my information from all deleted users in my current account, some data I can not copy because I am not right to read, I type in terminal "sudo nautilus" and I get the permission for read, but the copied data is available only for superusers and I must charge the permissions for each file and each folder. How I can copy the information with out the superuser rights OR how I can charge the permissions for selected folder and all files and folders included in it?

Read the article
More Value From Data Using Data Mining Presentation

Here is a presentation I gave at the SQLBits conference in September which was recorded by Microsoft. Usually I speak about SSIS but on this particular event I thought people would like to hear something different from me. Microsoft are making a big play for making Data Mining more accessible to everyone and not just boffins. In this presentation I give an overview of data mining and then do some demonstrations using the excellent Excel Add-Ins available from Microsoft SQL Server 2008 SQL Server 2005 I hope you enjoy this presentation http://go.microsoft.com/?linkid=9633764

Read the article
SQL Transactional Replication snapshot not applying

- by dmch2

Hi, I'm using SQL Transactional Replication with pull subscriptions to replicate databases (hosting their own distribution database) from several servers across a VPN to a central server. I've got the first 2 databases working fine but the 3rd one is causing me problems. My subscription server is SQL 2008, the source systems are all SQL 2005. The source databases are a few 100Mb in size and contain audit data so are simply growing slowly by adding new records at approx 1kb a second. As far as the replication monitor, Agent logs and event logs show everything is working fine - except that no data appears in my subscription database. The distribution agent doesn't seem to want to read the snapshot (and hence the initial state and schema) from the publisher. New transactions aren't applied although they do seem to be arriving OK as the replication monitor shows things like '5 transactions with 10 commands were delivered'. I would expect (as in previous times) to see statements about data being BCPed in the replication monitor. The snapshot is on the publisher on a shared folder. The subscriber can view the snapshot OK (\\repldata) and the alt snapshot folder is pointing at it. But the distribution agent doesn't seem to be making an attempt to do read it. I tried changing the snapshot path to something that's incorrect and didn't even get an error saying that it couldn't access it. After lots of googling etc I found that sp_MSget_repl_commands is called by the subscriber on the distribution database on the publisher. Running a profiler I can see that it's only called for one agent Id. After a reinit it's called for sequence number 0x0 as expected so I thought that would mean it's would look for the snapshot. However, looking on the publisher I see that there's data for two agents - the snapshot agent and the log reader agent (which is being queries). So I guess I need to tell the distribution agent to get the data for both. But how? and more importantly - why? It worked fine on the other two servers I've replicated. I'm not an SQL novice but this is pretty much my first go at replication so don't be afraid to accuse me of missing something obvious/stupid! I can get log files (eg from the distribution agent) if you want but they don't seem to have any errors in them - it just starts up and starts applying log reader agent changes. Cheers Dave

Read the article
Lync CMS replication is failing for all Domain Computers

- by Ravi Kanneganti

I have Lync Server 2010 and Active Directory installed on 2 different Windows Server 2008 R2 machines. I have added a Windows 7 PC to AD. And I have added this computer to Trusted Application Servers Pool and published the topology. I want to build an UCMA application to extend Lync Server functionality. I have installed UCMA 3.0 SDK in the same computer where Lync Server is residing. But, CMS Replication isn't happening and "Get-CsManagementStoreReplicationStatus" always gives Uptodate as "False" for my Windows 7 PC. I have even tried "Invoke-CSManagementStoreReplication" but nothing changed. Also, this is the error message that I can see in the log file: TL_WARN(TF_COMPONENT) [2]0500.07B8::04/05/2012-14:55:07.296.00000f85 (XDS_Replica_Replicator,FileDistributeTask.Execute:filedistributetask.cs(165))(000000000043B3FA)**Could not distribute the file. Exception: [System.IO.IOException: The process cannot access the file because it is being used by another process.** at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) at System.IO.File.Move(String sourceFileName, String destFileName) at Microsoft.Rtc.Xds.Replication.Replicator.Common.FileDistributeTask.Execute()]. TL_NOISE(TF_DIAG) [2]0500.07B8::04/05/2012-14:55:07.296.00000f86 (XDS_Replica_Replicator,ReplicaTaskContainer<T>.OnError:replicataskcontainer.cs(166))(00000000005C39D4)Enter. TL_INFO(TF_COMPONENT) [2]0500.07B8::04/05/2012-14:55:07.296.00000f87 (XDS_Replica_Replicator,ReplicaTaskContainer<T>.OnError:replicataskcontainer.cs(171))(00000000005C39D4)Task error callback is about to be called. TL_VERBOSE(TF_DIAG) [2]0500.07B8::04/05/2012-14:55:07.296.00000f88 (XDS_Replica_Replicator,PerReplicaTaskManager<T>.HandleTaskError:perreplicataskmanager.cs(230))(000000000385E79C)Enter. TL_INFO(TF_COMPONENT) [2]0500.07B8::04/05/2012-14:55:07.296.00000f89 (XDS_Replica_Replicator,PerReplicaTaskManager<T>.HandleTaskError:perreplicataskmanager.cs(234))(000000000385E79C)Task encountered an error: [ReplicaTaskContainer<FileDistributeTask>{FileDistributeTask{E:\RtcReplicaRoot\xds-replica\from-master\data.zip, E:\RtcReplicaRoot\xds-replica\working\replication\from-master\data.zip, **Access failed**. (E:\RtcReplicaRoot\xds-replica\from-master\data.zip)}, FileDistributeTask{E:\RtcReplicaRoot\xds-replica\from-master\data.zip, E:\RtcReplicaRoot\xds-replica\working\replication\from-master\data.zip, }}]

Read the article
Exchange 2010: Replication Service Still Trying to Replicate Deleted Mailbox Store

- by ThaKidd

In advance, thank you for your opinions! I just migrated from Server/Exchange 2003 to Server 2008 SR2 running Exchange 2010. I had an extra mailbox that appeared with some system mailboxes in it. I used the EMS to move those mailboxes over and then deleted the store out of the EMC. Since then every so often I get an Error in Event Viewer. Source: MSExchangeRepl ID: 4098 Error: The Microsoft Exchange Replication service couldn't find a valid configuration for database '5f012f40-3bad-4003-a373-dbc0ffb6736f' on server 'EXCHSERVER'. Error: (nothing after this) I can confirm that the above GUID is the mailbox store of that I deleted. No other Exchange errors occur. How can I tell Exchange Replication to ignore this store? Setup, one Exchange server 2003 transitioned over to 2010. No other Exchange servers. Is there a way to fix this? Do I need to change a setting to stop replication? I plan to add a second Exchange server in the next few days so stopping replication would be a bad thing. Thanks again in advance. Jason

Read the article
SQL Server 2008 R2 transactional replication over VPN

- by enashnash

I'm having difficulty setting up replication over a VPN. I have a SQL Server 2008 R2, Enterprise Edition database on a Windows 2008 R2 Server. SQL Server is running on a non-standard port. I have set it up so that it is acting as its own distributor and have configured a publisher on this server. It is set as an updatable transational publication (yes, this is necessary). On this server, I have Routing and Remote Access enabled in order to be able to establish VPN connections. It is configured with a static IP address pool, of which the first in the range is always assigned to the server. I have assigned a test user a static address within this range (I don't know if this is necessary or not). All clients will be 2008 R2 versions, but could be SQL Express or standalone developer instances of the full product. I can establish a VPN connection from the client without problems and can see that the correct IP addresses are allocated. After connecting to the database to test that I can establish a connection, I realised that I needed to be able to connect to the database using the server name rather than an IP address - required for replication - which wouldn't work initially. I created an entry in the hosts file for the server on the client using the NETBIOS name of the server, and now I can connect to the server, from the client, using the SERVER\INSTANCE, PORT syntax, over the VPN. As it is the default instance on the server, I can also connect with simply SERVER, PORT syntax. After all that, I still get the following dreaded error: SQL Server replication requires the actual server name to make a connection to the server. Connections through a server alias, IP address, or any other alternate name are not supported. Specify the actual server name, 'SERVER\INSTANCE'. (Replication.Utilities). What have I missed? How do I get this to work? TIA

Read the article
SQL SERVER – Guest Post – Architecting Data Warehouse – Niraj Bhatt

- by pinaldave

Niraj Bhatt works as an Enterprise Architect for a Fortune 500 company and has an innate passion for building / studying software systems. He is a top rated speaker at various technical forums including Tech·Ed, MCT Summit, Developer Summit, and Virtual Tech Days, among others. Having run a successful startup for four years Niraj enjoys working on – IT innovations that can impact an enterprise bottom line, streamlining IT budgets through IT consolidation, architecture and integration of systems, performance tuning, and review of enterprise applications. He has received Microsoft MVP award for ASP.NET, Connected Systems and most recently on Windows Azure. When he is away from his laptop, you will find him taking deep dives in automobiles, pottery, rafting, photography, cooking and financial statements though not necessarily in that order. He is also a manager/speaker at BDOTNET, Asia’s largest .NET user group. Here is the guest post by Niraj Bhatt. As data in your applications grows it’s the database that usually becomes a bottleneck. It’s hard to scale a relational DB and the preferred approach for large scale applications is to create separate databases for writes and reads. These databases are referred as transactional database and reporting database. Though there are tools / techniques which can allow you to create snapshot of your transactional database for reporting purpose, sometimes they don’t quite fit the reporting requirements of an enterprise. These requirements typically are data analytics, effective schema (for an Information worker to self-service herself), historical data, better performance (flat data, no joins) etc. This is where a need for data warehouse or an OLAP system arises. A Key point to remember is a data warehouse is mostly a relational database. It’s built on top of same concepts like Tables, Rows, Columns, Primary keys, Foreign Keys, etc. Before we talk about how data warehouses are typically structured let’s understand key components that can create a data flow between OLTP systems and OLAP systems. There are 3 major areas to it: a) OLTP system should be capable of tracking its changes as all these changes should go back to data warehouse for historical recording. For e.g. if an OLTP transaction moves a customer from silver to gold category, OLTP system needs to ensure that this change is tracked and send to data warehouse for reporting purpose. A report in context could be how many customers divided by geographies moved from sliver to gold category. In data warehouse terminology this process is called Change Data Capture. There are quite a few systems that leverage database triggers to move these changes to corresponding tracking tables. There are also out of box features provided by some databases e.g. SQL Server 2008 offers Change Data Capture and Change Tracking for addressing such requirements. b) After we make the OLTP system capable of tracking its changes we need to provision a batch process that can run periodically and takes these changes from OLTP system and dump them into data warehouse. There are many tools out there that can help you fill this gap – SQL Server Integration Services happens to be one of them. c) So we have an OLTP system that knows how to track its changes, we have jobs that run periodically to move these changes to warehouse. The question though remains is how warehouse will record these changes? This structural change in data warehouse arena is often covered under something called Slowly Changing Dimension (SCD). While we will talk about dimensions in a while, SCD can be applied to pure relational tables too. SCD enables a database structure to capture historical data. This would create multiple records for a given entity in relational database and data warehouses prefer having their own primary key, often known as surrogate key. As I mentioned a data warehouse is just a relational database but industry often attributes a specific schema style to data warehouses. These styles are Star Schema or Snowflake Schema. The motivation behind these styles is to create a flat database structure (as opposed to normalized one), which is easy to understand / use, easy to query and easy to slice / dice. Star schema is a database structure made up of dimensions and facts. Facts are generally the numbers (sales, quantity, etc.) that you want to slice and dice. Fact tables have these numbers and have references (foreign keys) to set of tables that provide context around those facts. E.g. if you have recorded 10,000 USD as sales that number would go in a sales fact table and could have foreign keys attached to it that refers to the sales agent responsible for sale and to time table which contains the dates between which that sale was made. These agent and time tables are called dimensions which provide context to the numbers stored in fact tables. This schema structure of fact being at center surrounded by dimensions is called Star schema. A similar structure with difference of dimension tables being normalized is called a Snowflake schema. This relational structure of facts and dimensions serves as an input for another analysis structure called Cube. Though physically Cube is a special structure supported by commercial databases like SQL Server Analysis Services, logically it’s a multidimensional structure where dimensions define the sides of cube and facts define the content. Facts are often called as Measures inside a cube. Dimensions often tend to form a hierarchy. E.g. Product may be broken into categories and categories in turn to individual items. Category and Items are often referred as Levels and their constituents as Members with their overall structure called as Hierarchy. Measures are rolled up as per dimensional hierarchy. These rolled up measures are called Aggregates. Now this may seem like an overwhelming vocabulary to deal with but don’t worry it will sink in as you start working with Cubes and others. Let’s see few other terms that we would run into while talking about data warehouses. ODS or an Operational Data Store is a frequently misused term. There would be few users in your organization that want to report on most current data and can’t afford to miss a single transaction for their report. Then there is another set of users that typically don’t care how current the data is. Mostly senior level executives who are interesting in trending, mining, forecasting, strategizing, etc. don’t care for that one specific transaction. This is where an ODS can come in handy. ODS can use the same star schema and the OLAP cubes we saw earlier. The only difference is that the data inside an ODS would be short lived, i.e. for few months and ODS would sync with OLTP system every few minutes. Data warehouse can periodically sync with ODS either daily or weekly depending on business drivers. Data marts are another frequently talked about topic in data warehousing. They are subject-specific data warehouse. Data warehouses that try to span over an enterprise are normally too big to scope, build, manage, track, etc. Hence they are often scaled down to something called Data mart that supports a specific segment of business like sales, marketing, or support. Data marts too, are often designed using star schema model discussed earlier. Industry is divided when it comes to use of data marts. Some experts prefer having data marts along with a central data warehouse. Data warehouse here acts as information staging and distribution hub with spokes being data marts connected via data feeds serving summarized data. Others eliminate the need for a centralized data warehouse citing that most users want to report on detailed data. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Best Practices, Business Intelligence, Data Warehousing, Database, Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Problem setting up Master-Master Replication in MySQL

- by Andrew

I am attempting to setup Master-Master Replication on two MySQL database servers. I have followed the steps in this guide, but it fails in the middle of Step 4 with SHOW MASTER STATUS; It simply returns an empty set. I get the same 3 errors in both servers' logs. MySQL errors on SQL1: [ERROR] Failed to open the relay log './sql1-relay-bin.000001' (relay_log_pos 4) [ERROR] Could not find target log during relay log initialization [ERROR] Failed to initialize the master info structure MySQL Errors on SQL2: [ERROR] Failed to open the relay log './sql2-relay-bin.000001' (relay_log_pos 4) [ERROR] Could not find target log during relay log initialization [ERROR] Failed to initialize the master info structure The errors make no sense because I'm not referencing those files in any of my configurations. I'm using Ubuntu Server 10.04 x64 and my configuration files are copied below. I don't know where to go from here or how to troubleshoot this. Please help. Thanks. /etc/mysql/my.cnf on SQL1: # # The MySQL database server configuration file. # # You can copy this to one of: # - "/etc/mysql/my.cnf" to set global options, # - "~/.my.cnf" to set user-specific options. # # One can use all long options that the program supports. # Run program with --help to get a list of available options and with # --print-defaults to see which it would actually understand and use. # # For explanations see # http://dev.mysql.com/doc/mysql/en/server-system-variables.html # This will be passed to all mysql clients # It has been reported that passwords should be enclosed with ticks/quotes # escpecially if they contain "#" chars... # Remember to edit /etc/mysql/debian.cnf when changing the socket location. [client] port = 3306 socket = /var/run/mysqld/mysqld.sock # Here is entries for some specific programs # The following values assume you have at least 32M ram # This was formally known as [safe_mysqld]. Both versions are currently parsed. [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] # # * Basic Settings # # # * IMPORTANT # If you make changes to these settings and your system uses apparmor, you may # also need to also adjust /etc/apparmor.d/usr.sbin.mysqld. # user = mysql socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp skip-external-locking # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. bind-address = <SQL1's IP> # # * Fine Tuning # key_buffer = 16M max_allowed_packet = 16M thread_stack = 192K thread_cache_size = 8 # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP #max_connections = 100 #table_cache = 64 #thread_concurrency = 10 # # * Query Cache Configuration # query_cache_limit = 1M query_cache_size = 16M # # * Logging and Replication # # Both location gets rotated by the cronjob. # Be aware that this log type is a performance killer. # As of 5.1 you can enable the log at runtime! #general_log_file = /var/log/mysql/mysql.log #general_log = 1 log_error = /var/log/mysql/error.log # Here you can see queries with especially long duration #log_slow_queries = /var/log/mysql/mysql-slow.log #long_query_time = 2 #log-queries-not-using-indexes # # The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change. server-id = 1 replicate-same-server-id = 0 auto-increment-increment = 2 auto-increment-offset = 1 master-host = <SQL2's IP> master-user = slave_user master-password = "slave_password" master-connect-retry = 60 replicate-do-db = db1 log-bin= /var/log/mysql/mysql-bin.log binlog-do-db = db1 binlog-ignore-db = mysql relay-log = /var/lib/mysql/slave-relay.log relay-log-index = /var/lib/mysql/slave-relay-log.index expire_logs_days = 10 max_binlog_size = 500M # # * InnoDB # # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many! # # * Security Features # # Read the manual, too, if you want chroot! # chroot = /var/lib/mysql/ # # For generating SSL certificates I recommend the OpenSSL GUI "tinyca". # # ssl-ca=/etc/mysql/cacert.pem # ssl-cert=/etc/mysql/server-cert.pem # ssl-key=/etc/mysql/server-key.pem [mysqldump] quick quote-names max_allowed_packet = 16M [mysql] #no-auto-rehash # faster start of mysql but no tab completition [isamchk] key_buffer = 16M # # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. # !includedir /etc/mysql/conf.d/ /etc/mysql/my.cnf on SQL2: # # The MySQL database server configuration file. # # You can copy this to one of: # - "/etc/mysql/my.cnf" to set global options, # - "~/.my.cnf" to set user-specific options. # # One can use all long options that the program supports. # Run program with --help to get a list of available options and with # --print-defaults to see which it would actually understand and use. # # For explanations see # http://dev.mysql.com/doc/mysql/en/server-system-variables.html # This will be passed to all mysql clients # It has been reported that passwords should be enclosed with ticks/quotes # escpecially if they contain "#" chars... # Remember to edit /etc/mysql/debian.cnf when changing the socket location. [client] port = 3306 socket = /var/run/mysqld/mysqld.sock # Here is entries for some specific programs # The following values assume you have at least 32M ram # This was formally known as [safe_mysqld]. Both versions are currently parsed. [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] # # * Basic Settings # # # * IMPORTANT # If you make changes to these settings and your system uses apparmor, you may # also need to also adjust /etc/apparmor.d/usr.sbin.mysqld. # user = mysql socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp skip-external-locking # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. bind-address = <SQL2's IP> # # * Fine Tuning # key_buffer = 16M max_allowed_packet = 16M thread_stack = 192K thread_cache_size = 8 # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP #max_connections = 100 #table_cache = 64 #thread_concurrency = 10 # # * Query Cache Configuration # query_cache_limit = 1M query_cache_size = 16M # # * Logging and Replication # # Both location gets rotated by the cronjob. # Be aware that this log type is a performance killer. # As of 5.1 you can enable the log at runtime! #general_log_file = /var/log/mysql/mysql.log #general_log = 1 log_error = /var/log/mysql/error.log # Here you can see queries with especially long duration #log_slow_queries = /var/log/mysql/mysql-slow.log #long_query_time = 2 #log-queries-not-using-indexes # # The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change. server-id = 2 replicate-same-server-id = 0 auto-increment-increment = 2 auto-increment-offset = 2 master-host = <SQL1's IP> master-user = slave_user master-password = "slave_password" master-connect-retry = 60 replicate-do-db = db1 log-bin= /var/log/mysql/mysql-bin.log binlog-do-db = db1 binlog-ignore-db = mysql relay-log = /var/lib/mysql/slave-relay.log relay-log-index = /var/lib/mysql/slave-relay-log.index expire_logs_days = 10 max_binlog_size = 500M # # * InnoDB # # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many! # # * Security Features # # Read the manual, too, if you want chroot! # chroot = /var/lib/mysql/ # # For generating SSL certificates I recommend the OpenSSL GUI "tinyca". # # ssl-ca=/etc/mysql/cacert.pem # ssl-cert=/etc/mysql/server-cert.pem # ssl-key=/etc/mysql/server-key.pem [mysqldump] quick quote-names max_allowed_packet = 16M [mysql] #no-auto-rehash # faster start of mysql but no tab completition [isamchk] key_buffer = 16M # # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. # !includedir /etc/mysql/conf.d/

Read the article
Data Mining Resources

- by Dejan Sarka

There are many different types of analyses, each one with its own pros and cons. Relational reports have a predefined structure, and end users cannot change it. They are simple to use for end users. Reports can use real-time data and snapshots of data to show the state of a report at specific points in time. One of the drawbacks is that report authoring is limited to IT pros and advanced users. Any kind of dynamic restructuring is very limited. If real-time data is used for a report, the report has a negative impact on the performance of the source system. Processing of the reports might be slow because the data comes from relational database management systems, which are not optimized for reporting only. If you create a semantic model of your data, your end users can create ad-hoc report structures. However, the development is more complex because a developer is needed to create these semantic models. For OLAP, you typically use specialized database management systems. You get lightning speed of analyses. End users can use rich and thin clients to interactively change the structure of the report. Typically, they do it graphically. However, the development of an OLAP system is many times quite complex. It involves the preparation and maintenance of an enterprise data warehouse and OLAP cubes. In order to exploit the possibility of real-time restructuring of reports, the users must be both active and educated. The data is usually stale, as it is loaded into data warehouses and OLAP cubes with a scheduled process. With data mining, a structure is not selected in advance; it searches for the structure. As a result, data mining can give you the most valuable results because you can discover patterns you did not expect. A data mining model structure is limited only by the attributes that you use to train the model. One of the drawbacks is that a lot of knowledge is needed for a successful data mining project. End users have to understand the results. Subject matter experts and IT professionals need to understand business problem thoroughly. The development might be sometimes even more complex than the development of OLAP cubes. Each type of analysis has its own place in an enterprise system. SQL Server has tools for all kinds of analyses. However, data mining is the most advanced way of analyzing the data; this is the “I” in BI. In order to get the most out of it, you need to learn quite a lot. In this blog post, I am gathering together resources for learning, including forthcoming events. Books Multiple authors: SQL Server MVP Deep Dives – I wrote an introductory data mining chapter there. Erik Veerman, Teo Lachev and Dejan Sarka: MCTS Self-Paced Training Kit (Exam 70-448): Microsoft SQL Server 2008 - Business Intelligence Development and Maintenance – you can find a good overview of a complete BI solution, including data mining, in this book. Jamie MacLennan, ZhaoHui Tang, and Bogdan Crivat: Data Mining with Microsoft SQL Server 2008 – can’t miss this book if you want to mine your data with SQL Server tools. Michael Berry, Gordon Linoff: Mastering Data Mining: The Art and Science of Customer Relationship Management – data mining from both, business and technical perspective. Dorian Pyle: Data Preparation for Data Mining – an in-depth book about data preparation. Thomas and Ronald Wonnacott: Introductory Statistics – if you thought that you could get away without statistics, then you are not serious about data mining. Jiawei Han and Micheline Kamber: Data Mining Concepts and Techniques – in-depth explanation of the most popular data mining algorithms. Michael Berry and Gordon Linoff: Data Mining Techniques – another book that explains data mining algorithms, more fro a business perspective. Paolo Guidici: Applied Data Mining – very mathematical book, only if you enjoy statistics and mathematics in general. Forthcoming presentations I am presenting two data mining related sessions during the PASS Summit in Charlotte, NC: Wednesday, October 16th, 2013 - Fraud Detection: Notes from the Field – I am showing how to use data mining for a specific business problem. The presentation is based on real-life projects. Friday, October 18th: Excel 2013 Advanced Analytics – I am focusing on Excel Data Mining Add-ins, and how to use them together with Power Pivot and other add-ins. This is the most you can get out of Excel. Sinergija 2013, Belgrade, Serbia Tuesday, October 22nd: Excel 2013 Analytics to the Max – another presentation focusing on the most advanced analytics you can get in Excel. SQL Rally Amsterdam, Netherlands Thursday, November 7th: Advanced Analytics in Excel 2013 – and again I am presenting about data mining in Excel. Why three different titles for the same presentation? I don’t know, I guess I forgot the name I proposed every time right after I sent the proposal. Courses Data Mining with SQL Server 2012 – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses [email protected] or [email protected]. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well. OK, now you know: no more excuses, start learning data mining, get the most out of your data

Read the article
Data Integration 12c Raising the Big Data Roof at Oracle OpenWorld

- by Tanu Sood

Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Times New Roman","serif"; mso-fareast-font-family:"MS Mincho";} Author: Dain Hansen, Director, Oracle It was an exciting OpenWorld 2013 for us in the Data Integration track. Our theme this year was all about ‘being future ready’ - previewing one of our biggest releases this year: Oracle Data Integration 12c. Just this week we followed up with this preview by announcing the general availability of 12c release for Oracle’s key data integration products: Oracle Data Integrator 12c and Oracle GoldenGate 12c. The new release delivers extreme performance, increase IT productivity, and simplify deployment, while helping IT organizations to keep pace with new data-oriented technology trends including cloud computing, big data analytics, real-time business intelligence. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Times New Roman","serif"; mso-fareast-font-family:"MS Mincho";} Mark Hurd's keynote on day one set the tone for the Data Integration sessions. Mark focused on big data analytics and the changing consumer expectations. Especially real-time insight is a key theme for Oracle overall and data integration products. In Mark Hurd's keynote we heard from key customers, such as Airbus and Thomson Reuters, how real-time analysis of operational data including machine data creates value, in some cases even saves lives. Thomas Kurian gave a deeper look into Oracle's big data and fast data solutions. In the initial lead Data Integration track session - Brad Adelberg, VP of Development, presented Oracle’s Data Integration 12c product strategy based on key trends from the initial OpenWorld keynotes. Brad talked about how Oracle's data integration products address the new data integration requirements that evolved with cloud computing, big data, and changing consumer expectations and how they set the key themes in our products’ road map. Brad explained why and how fast-time to value, high-performance and future-ready solutions is the top focus areas for product development. If you were not able to attend OpenWorld or this session I recommend reading the white paper: Five New Data Integration Requirements and How to Meet them with Oracle Data Integration, which provides an in-depth look into how Oracle addresses the new trends in the DI market. Following Brad’s session, Nick Wagner provided in depth review of Oracle GoldenGate’s latest features and roadmap. Nick discussed how Oracle GoldenGate’s tight integration with Oracle Database sets the product apart from the competition. We also heard that heterogeneity of the product is still a major focus for GoldenGate’s development and there will be more news on that front when there is a major release. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Times New Roman","serif"; mso-fareast-font-family:"MS Mincho";} After GoldenGate’s product strategy session, Denis Gray from the PM team presented Oracle Data Integrator’s product strategy session, talking about the latest and greatest on ODI. Another good session was delivered by long-time GoldenGate users, Comcast. Jason Hurd and Amit Patel of Comcast talked about the various use cases they deploy Oracle GoldenGate throughout their enterprise, from database upgrades, feeding reporting systems, to active-active database synchronization. The Comcast team shared many good tips on how to use GoldenGate for both zero downtime upgrades and active-active replication with conflict management requirement. One of our other important goals we had this year for the Data Integration track at OpenWorld was hearing from our customers. We ended day 1 on just that, with a wonderful award ceremony for Oracle Excellence Awards for Oracle Fusion Middleware Innovation. The ceremony was held in the Yerba Buena Center for the Arts. Congratulations to Royal Bank of Scotland and Yalumba Wine Company, the winners in the Data Integration category. You can find more information on the award and the winners in our previous blog post: 2013 Oracle Excellence Awards for Fusion Middleware Innovation… Selected for their innovation use of Oracle’s Data Integration products; the winners for the Data Integration Category are Royal Bank of Scotland and The Yalumba Wine Company. Congratulations!!! Royal Bank of Scotland’s Market and International Banking division provides clients across the globe with seamless trading and competitive pricing, underpinned by a deep knowledge of risk management across the full spectrum of financial products. They handle millions of transactions daily to keep the lifeblood of their clients’ businesses flowing – whether through payment management solutions or through bespoke trade finance solutions. Royal Bank of Scotland is leveraging Oracle GoldenGate and Oracle Data Integrator along with Oracle Business Intelligence Enterprise Edition and the Oracle Database for a variety of solutions. Mainly, Oracle GoldenGate and Oracle Data Integrator are used to feed their data warehouse – providing a real-time data integration solution that feeds transactional data to their analytics system in minutes to enable improved decision making with timely, accurate data for their business users. Oracle Data Integrator’s in-database transformation capabilities and its ability to integrate with Oracle GoldenGate for real-time data capture is the foundation of this implementation. This solution makes it such that changes happening in the analytics systems are available the same day they are deployed on the operational system with 100% data quality guaranteed. Additionally, the solution has helped to reduce their operational database size from 150GB to 10GB. Impressive! Now what if I told you this solution was built in 3 months and had a less than 6 month return on investment? That’s outstanding! The Yalumba Wine Company is situated in the Barossa Valley of Australia. It is the oldest family owned winery in Australia with a unique way of aging their wines in specially crafted 100 liter barrels. Did you know that “Yalumba” is Aboriginal for “all the land around”? The Yalumba Wine Company is growing rapidly, and was in need of introducing a more modern standard to the existing manufacturing processes to meet globalization demands, overall time-to-market, and better operational efficiency objectives of product development. The Yalumba Wine Company worked with a partner, Bristlecone to develop a unique solution whereby Oracle Data Integrator is leveraged to pull data from Salesforce.com and JD Edwards, in addition to their other pre-existing source systems, for consumption into their data warehouse. They have emphasized the overall ease of developing integration workflows with Oracle Data Integrator. The solution has brought better visibility for the business users, shorter data loading and transformation performance to their data warehouse with rapid incorporation of new data sources, and a solid future-proof foundation for their organization. Moving forward, they plan on leveraging more from Oracle’s Data Integration portfolio. Terrific! In addition to these two customers on Tuesday we featured many other important Oracle Data Integrator and Oracle GoldenGate customers. On Tuesday the GoldenGate panel included: Land O’Lakes, Smuckers, and Veolia Water. Besides giving us yummy nutrition and healthy water, these companies have another aspect in common. They all use GoldenGate to boost their ERP application. Please read the recap by Irem Radzik. On Wednesday, the ODI Panel included: Barry Ralston and Ryan Weber of Infinity Insurance, Paul Stracke of Paychex Inc., and Ian Wall of Vertex Pharmaceuticals for a session filled with interesting projects, use cases and approaches to leveraging Oracle Data Integrator. Please read the recap by Sandrine Riley for more. Thanks to everyone who joined with us and we hope to stay connected! To hear more about our Data Integration12c products join us in an upcoming webcast to learn more. Follow us www.twitter.com/ORCLGoldenGate or goto our website at www.oracle.com/goto/dataintegration

Read the article
Master Data Services Employees Sample Model

- by Davide Mauri

I’ve been playing with Master Data Services quite a lot in those last days and I’m also monitoring the web for all available resources on it. Today I’ve found this freshly released sample available on MSDN Code Gallery: SQL Server Master Data Services Employee Sample Model http://code.msdn.microsoft.com/SSMDSEmployeeSample This sample shows how Recursive Hierarchies can be modeled in order to represent a typical organizational chart scenario where a self-relationship exists on the Employee entity. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Looking for Cutting-Edge Data Integration: 2010 Innovation Awards

- by dain.hansen

This year's Oracle Fusion Middleware Innovation Awards will honor customers and partners who are creatively using to various products across Oracle Fusion Middleware. Brand new to this year's awards is a category for Data Integration. Think you have something unique and innovative with one of our Oracle Data Integration products? We'd love to hear from you! Please submit today The deadline for the nomination is 5 p.m. PT Friday, August 6th 2010, and winning organizations will be notified by late August 2010. What you win! FREE pass to Oracle OpenWorld 2010 in San Francisco for select winners in each category. Honored by Oracle executives at awards ceremony held during Oracle OpenWorld 2010 in San Francisco. Oracle Middleware Innovation Award Winner Plaque 1-3 meetings with Oracle Executives during Oracle OpenWorld 2010 Feature article placement in Oracle Magazine and placement in Oracle Press Release Customer snapshot and video testimonial opportunity, to be hosted on oracle.com Podcast interview opportunity with Senior Oracle Executive

Read the article
SQL 2008 R2 3rd Party Peer-to-Peer Replication, Global Site Distribution

- by gombala

We are looking at hosting 3 globally distributed SQL Server installations at different data centers. The intent is that Site A will serve web traffic and data for a specific region, same with Site B and C. In the case that Site A data center goes down, looses connectivity, etc. the users of Site A users will fail over to Site B or C (depending which is up). Also, if a user from Site A travels to Site C they should be able to access their data as it was on Site A. My questions is what SQL replication technology (SQL Replication or 3rd party) can support this scenario? We are using SQL 2008 R2 Enterprise at each site, each site runs on top of VMWare with a Netapp filer. Would something like distributed caching help in this scenario as well? We have looked at and tested Peer-to-Peer replication but have encountered issues with conflicts during our testing. I imagine there are other global data centers that have encountered and solved this issue.

Read the article
How to take mysql replication backup

- by user53864

I have a MySQL master-master replication setup with a slave for each master(only one master used for read/writes at a time) on Ubuntu server. Wondering what would be the best way to schedule backup of replication databases with mysqldump. I have following clarifications because of which could not proceed further. Scheduling mysqldump backup on masters safe for replication? Connecting masters with GUI applications(workbench) for database manipulations(read, writes.. by developers) is safe? Any inputs are welcome.

Read the article
Oracle GoldenGate 12c - Leading Enterprise Replication

- by Doug Reid

Oracle GoldenGate 12c released on October 17th and includes several new cutting edge features that firmly establishes GoldenGate's leader position in the data replication space. In fact, this release more than doubles the performance of data delivery, supports Oracle's new multitenant database feature, it's more secure, has more options for high availability, and has made great strides to simplify the configuration and deployment of the product. Read through the press release if you haven't already and do not miss the quote from Cern's Eva Dafonte Perez, regarding Oracle GoldenGate 12c "….performs five times faster compared to previous GoldenGate versions and simplifies the management of a multi-tier environment" There are a variety of new and improved features in the Oracle GoldenGate 12c. Here are the highlights: Optimized for Oracle Database 12c - GoldenGate 12c is custom tailored to the unique capabilities of Oracle database 12c and out of the box GoldenGate 12c supports multitenant (pluggable database (PDB)) and non-consolidated deployments of Oracle Database 12c. The naming convention used by database 12c is now in three parts (PDB-name, schema-name, and object name). We have made changes to the GoldenGate capture process to support the new naming convention and streamlined the whole process so a single GoldenGate capture process is being used at the container level rather than at each individual PDB. By having the capture process at the container level resource usage and the number of processes are reduced. To view a conceptual architecture diagram click here. Integrated Delivery for the Oracle Database - Leveraging a lightweight streaming API built exclusively for Oracle GoldenGate 12c, this process distributes load, auto tunes the degree of parallelism, scales better, and delivers blinding rates of changed data delivery to the Oracle database. One of the goals for Oracle GoldenGate 12c was to reduce IT costs by simplifying the configuration and reduce the time to manage complex infrastructures. In previous versions of Oracle GoldenGate, customers would split transaction loads by grouping tables into multiple different delivery processes (click here to view the previous method). Each delivery process executed independently and without any interaction or knowledge of other delivery processes. This setup was complicated to configure and time consuming as the developer needed in-depth knowledge of the source and target schemas and the transaction profile. With GoldenGate 12c and Integrated Delivery we have made it easier to configure and faster to deploy. To view a conceptual architecture diagram of integrated delivery click here Coordinated Delivery for Non-Oracle Databases - Coordinated Delivery orchestrates high-speed apply processes and simplifies the configuration of GoldenGate for non-Oracle targets. In Oracle GoldenGate 12c a single delivery process is used with multiple threads (click here) and key events, such as primary key updates, event markers, DDL, etc, are coordinated between the various threads to insure that the transactions are applied in the same sequence as they were captured, all while delivery improved performance. Replication Between On-Premises and Cloud-Based systems. - The trend for business to utilize both on-premises and cloud-based systems is rising and businesses need to replicate data back and forth. GoldenGate 12c can be configured in a variety of ways to provide real-time replication when unrestricted or restricted (limited ports or HTTP tunneling) networks are between on-premises and cloud-based systems. Expanded Heterogeneity - It wouldn't be a GoldenGate release without new and improved platform support. Release 1 includes support for MySQL 5.6 and Sybase 15.7. Upcoming in the next release GoldenGate, support will be expanded for MS SQL Server, DB2, and Teradata. Tighter Security - Oracle GoldenGate 12c is integrated with the Oracle wallet to shield usernames and passwords using strong encryption and aliases. Customers accustomed to using the Oracle Wallet with other Oracle products will instantly be familiar with how to use this great new feature Expanded Oracle Application and Technology Support - GoldenGate can be used along with Oracle Coherence to enable real-time changed data feeds to the Coherence cache using Toplink and the Oracle GoldenGate JMS adapter. Plus, Oracle Advanced Customer Services (ACS) now offers a low downtime E-Business Suite platform and database migrations using GoldenGate as the enabling technology. Keep tuned for more blogs on the new features and the upcoming launch webcast where we will go into these new features in more detail. In the mean time make sure to read through our white paper "Oracle GoldenGate 12c Release 1 New Features Overview"

Read the article
Fast Data - Big Data's achilles heel

- by thegreeneman

At OOW 2013 in Mark Hurd and Thomas Kurian's keynote, they discussed Oracle's Fast Data software solution stack and discussed a number of customers deploying Oracle's Big Data / Fast Data solutions and in particular Oracle's NoSQL Database. Since that time, there have been a large number of request seeking clarification on how the Fast Data software stack works together to deliver on the promise of real-time Big Data solutions. Fast Data is a software solution stack that deals with one aspect of Big Data, high velocity. The software in the Fast Data solution stack involves 3 key pieces and their integration: Oracle Event Processing, Oracle Coherence, Oracle NoSQL Database. All three of these technologies address a high throughput, low latency data management requirement. Oracle Event Processing enables continuous query to filter the Big Data fire hose, enable intelligent chained events to real-time service invocation and augments the data stream to provide Big Data enrichment. Extended SQL syntax allows the definition of sliding windows of time to allow SQL statements to look for triggers on events like breach of weighted moving average on a real-time data stream. Oracle Coherence is a distributed, grid caching solution which is used to provide very low latency access to cached data when the data is too big to fit into a single process, so it is spread around in a grid architecture to provide memory latency speed access. It also has some special capabilities to deploy remote behavioral execution for "near data" processing. The Oracle NoSQL Database is designed to ingest simple key-value data at a controlled throughput rate while providing data redundancy in a cluster to facilitate highly concurrent low latency reads. For example, when large sensor networks are generating data that need to be captured while analysts are simultaneously extracting the data using range based queries for upstream analytics. Another example might be storing cookies from user web sessions for ultra low latency user profile management, also leveraging that data using holistic MapReduce operations with your Hadoop cluster to do segmented site analysis. Understand how NoSQL plays a critical role in Big Data capture and enrichment while simultaneously providing a low latency and scalable data management infrastructure thru clustered, always on, parallel processing in a shared nothing architecture. Learn how easily a NoSQL cluster can be deployed to provide essential services in industry specific Fast Data solutions. See these technologies work together in a demonstration highlighting the salient features of these Fast Data enabling technologies in a location based personalization service. The question then becomes how do these things work together to deliver an end to end Fast Data solution. The answer is that while different applications will exhibit unique requirements that may drive the need for one or the other of these technologies, often when it comes to Big Data you may need to use them together. You may have the need for the memory latencies of the Coherence cache, but just have too much data to cache, so you use a combination of Coherence and Oracle NoSQL to handle extreme speed cache overflow and retrieval. Here is a great reference to how these two technologies are integrated and work together. Coherence & Oracle NoSQL Database. On the stream processing side, it is similar as with the Coherence case. As your sliding windows get larger, holding all the data in the stream can become difficult and out of band data may need to be offloaded into persistent storage. OEP needs an extreme speed database like Oracle NoSQL Database to help it continue to perform for the real time loop while dealing with persistent spill in the data stream. Here is a great resource to learn more about how OEP and Oracle NoSQL Database are integrated and work together. OEP & Oracle NoSQL Database.

Read the article
Oracle Announces Oracle Big Data Appliance X3-2 and Enhanced Oracle Big Data Connectors

- by jgelhaus

Enables Customers to Easily Harness the Business Value of Big Data at Lower Cost Engineered System Simplifies Big Data for the Enterprise Oracle Big Data Appliance X3-2 hardware features the latest 8-core Intel® Xeon E5-2600 series of processors, and compared with previous generation, the 18 compute and storage servers with 648 TB raw storage now offer: 33 percent more processing power with 288 CPU cores; 33 percent more memory per node with 1.1 TB of main memory; and up to a 30 percent reduction in power and cooling Oracle Big Data Appliance X3-2 further simplifies implementation and management of big data by integrating all the hardware and software required to acquire, organize and analyze big data. It includes: Support for CDH4.1 including software upgrades developed collaboratively with Cloudera to simplify NameNode High Availability in Hadoop, eliminating the single point of failure in a Hadoop cluster; Oracle NoSQL Database Community Edition 2.0, the latest version that brings better Hadoop integration, elastic scaling and new APIs, including JSON and C support; The Oracle Enterprise Manager plug-in for Big Data Appliance that complements Cloudera Manager to enable users to more easily manage a Hadoop cluster; Updated distributions of Oracle Linux and Oracle Java Development Kit; An updated distribution of open source R, optimized to work with high performance multi-threaded math libraries Read More Data sheet: Oracle Big Data Appliance X3-2 Oracle Big Data Appliance: Datacenter Network Integration Big Data and Natural Language: Extracting Insight From Text Thomson Reuters Discusses Oracle's Big Data Platform Connectors Integrate Hadoop with Oracle Big Data Ecosystem Oracle Big Data Connectors is a suite of software built by Oracle to integrate Apache Hadoop with Oracle Database, Oracle Data Integrator, and Oracle R Distribution. Enhancements to Oracle Big Data Connectors extend these data integration capabilities. With updates to every connector, this release includes: Oracle SQL Connector for Hadoop Distributed File System, for high performance SQL queries on Hadoop data from Oracle Database, enhanced with increased automation and querying of Hive tables and now supported within the Oracle Data Integrator Application Adapter for Hadoop; Transparent access to the Hive Query language from R and introduction of new analytic techniques executing natively in Hadoop, enabling R developers to be more productive by increasing access to Hadoop in the R environment. Read More Data sheet: Oracle Big Data Connectors High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Read the article
Best approach to accessing multiple data source in a web application

- by ced

I've a base web application developed with .net technologies (asp.net) used into our LAN by 30 users simultanousley. From this web application I've developed two verticalization used from online users. In future i expect hundreds users simultanousley. Our company has different locations. Each site use its own database. The web application needs to retrieve information from all existing databases. Currently there are 3 database, but it's not excluded in the future expansion of new offices. My question then is: What is the best strategy for a web application to retrieve information from different databases (which have the same schema) whereas the main objective performance data access and high fault tolerance? There are case studies in the literature that I can take as an example? Do you know some good documents to study? Do you have any tips to implement this task so efficient? Intuitively I would say that two possible strategy are: perform queries from different sources in real time and aggregate data on the fly; create a repository that contains the union of the entities of interest and perform queries directly on repository;

Read the article
SQL SERVER – Introduction to Big Data – Guest Post

- by pinaldave

BIG Data – such a big word – everybody talks about this now a days. It is the word in the database world. In one of the conversation I asked my friend Jasjeet Sigh the same question – what is Big Data? He instantly came up with a very effective write-up. Jasjeet is working as a Technical Manager with Koenig Solutions. He leads the SQL domain, and holds rich IT industry experience. Talking about Koenig, it is a 19 year old IT training company that offers several certification choices. Some of its courses include SharePoint Training, Project Management certifications, Microsoft Trainings, Business Intelligence programs, Web Design and Development courses etc. Big Data, as the name suggests, is about data that is BIG in nature. The data is BIG in terms of size, and it is difficult to manage such enormous data with relational database management systems that are quite popular these days. Big Data is not just about being large in size, it is also about the variety of the data that differs in form or type. Some examples of Big Data are given below : Scientific data related to weather and atmosphere, Genetics etc Data collected by various medical procedures, such as Radiology, CT scan, MRI etc Data related to Global Positioning System Pictures and Videos Radio Frequency Data Data that may vary very rapidly like stock exchange information Apart from difficulties in managing and storing such data, it is difficult to query, analyze and visualize it. The characteristics of Big Data can be defined by four Vs: Volume: It simply means a large volume of data that may span Petabyte, Exabyte and so on. However it also depends organization to organization that what volume of data they consider as Big Data. Variety: As discussed above, Big Data is not limited to relational information or structured Data. It can also include unstructured data like pictures, videos, text, audio etc. Velocity: Velocity means the speed by which data changes. The higher is the velocity, the more efficient should be the system to capture and analyze the data. Missing any important point may lead to wrong analysis or may even result in loss. Veracity: It has been recently added as the fourth V, and generally means truthfulness or adherence to the truth. In terms of Big Data, it is more of a challenge than a characteristic. It is difficult to ascertain the truth out of the enormous amount of data and the one that has high velocity. There are always chances of having un-precise and uncertain data. It is a challenging task to clean such data before it is analyzed. Big Data can be considered as the next big thing in the IT sector in terms of innovation and development. If appropriate technologies are developed to analyze and use the information, it can be the driving force for almost all industrial segments. These include Retail, Manufacturing, Service, Finance, Healthcare etc. This will help them to automate business decisions, increase productivity, and innovate and develop new products. Thanks Jasjeet Singh for an excellent write up. Jasjeet Sign is working as a Technical Manager with Koenig Solutions. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Database, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Big Data

Read the article
Creating a Corporate Data Hub

- by BuckWoody

The Windows Azure Marketplace has a rich assortment of data and software offerings for you to use – a type of Software as a Service (SaaS) for IT workers, not necessarily for end-users. Among those offerings is the “Data Hub” – a codename for a project that ironically actually does what the codename says. In many of our organizations, we have multiple data quality issues. Finding data is one problem, but finding it just once is often a bigger problem. Lots of departments and even individuals have stored the same data more than once, and in some cases, made changes to one of the copies. It’s difficult to know which location or version of the data is authoritative. Then there’s the problem of accessing the data. It’s fairly straightforward to publish a database, share or other location internally to store the data. But then you have to figure out who owns it, how it is controlled, and pass out the various connection strings to those who want to use it. And then you need to figure out how to let folks access the internal data externally – bringing up all kinds of security issues. Finally, in many cases our user community wants us to combine data from the internally sources with external data, bringing up the security, strings, and exploration features up all over again. Enter the Data Hub. This is an online offering, where you assign an administrator and data stewards. You import the data into the service, and it’s available to you - and only you and your organization if you wish. The basic steps for this service are to set up the portal for your company, assign administrators and permissions, and then you assign data areas and import data into them. From there you make them discoverable, and then you have multiple options that you or your users can access that data. You’re then able, if you wish, to combine that data with other data in one location. So how does all that work? What about security? Is it really that easy? And can you really move the data definition off to the Subject Matter Experts (SME’s) that know the particular data stack better than the IT team does? Well, nothing good is easy – but using the Data Hub is actually pretty simple. I’ll give you a link in a moment where you can sign up and try this yourself. Once you sign up, you assign an administrator. From there you’ll create data areas, and then use a simple interface to bring the data in. All of this is done in a portal interface – nothing to install, configure, update or manage. After the data is entered in, and you’ve assigned meta-data to describe it, your users have multiple options to access it. They can simply use the portal – which actually has powerful visualizations you can use on any platform, even mobile phones or tablets. Your users can also hit the data with Excel – which gives them ultimate flexibility for display, all while using an authoritative, single reference for the data. Since the service is online, they can do this wherever they are – given the proper authentication and permissions. You can also hit the service with simple API calls, like this one from C#: http://msdn.microsoft.com/en-us/library/hh921924 You can make HTTP calls instead of code, and the data can even be exposed as an OData Feed. As you can see, there are a lot of options. You can check out the offering here: http://www.microsoft.com/en-us/sqlazurelabs/labs/data-hub.aspx and you can read the documentation here: http://msdn.microsoft.com/en-us/library/hh921938

Read the article
Creating a Corporate Data Hub

- by BuckWoody

The Windows Azure Marketplace has a rich assortment of data and software offerings for you to use – a type of Software as a Service (SaaS) for IT workers, not necessarily for end-users. Among those offerings is the “Data Hub” – a codename for a project that ironically actually does what the codename says. In many of our organizations, we have multiple data quality issues. Finding data is one problem, but finding it just once is often a bigger problem. Lots of departments and even individuals have stored the same data more than once, and in some cases, made changes to one of the copies. It’s difficult to know which location or version of the data is authoritative. Then there’s the problem of accessing the data. It’s fairly straightforward to publish a database, share or other location internally to store the data. But then you have to figure out who owns it, how it is controlled, and pass out the various connection strings to those who want to use it. And then you need to figure out how to let folks access the internal data externally – bringing up all kinds of security issues. Finally, in many cases our user community wants us to combine data from the internally sources with external data, bringing up the security, strings, and exploration features up all over again. Enter the Data Hub. This is an online offering, where you assign an administrator and data stewards. You import the data into the service, and it’s available to you - and only you and your organization if you wish. The basic steps for this service are to set up the portal for your company, assign administrators and permissions, and then you assign data areas and import data into them. From there you make them discoverable, and then you have multiple options that you or your users can access that data. You’re then able, if you wish, to combine that data with other data in one location. So how does all that work? What about security? Is it really that easy? And can you really move the data definition off to the Subject Matter Experts (SME’s) that know the particular data stack better than the IT team does? Well, nothing good is easy – but using the Data Hub is actually pretty simple. I’ll give you a link in a moment where you can sign up and try this yourself. Once you sign up, you assign an administrator. From there you’ll create data areas, and then use a simple interface to bring the data in. All of this is done in a portal interface – nothing to install, configure, update or manage. After the data is entered in, and you’ve assigned meta-data to describe it, your users have multiple options to access it. They can simply use the portal – which actually has powerful visualizations you can use on any platform, even mobile phones or tablets. Your users can also hit the data with Excel – which gives them ultimate flexibility for display, all while using an authoritative, single reference for the data. Since the service is online, they can do this wherever they are – given the proper authentication and permissions. You can also hit the service with simple API calls, like this one from C#: http://msdn.microsoft.com/en-us/library/hh921924 You can make HTTP calls instead of code, and the data can even be exposed as an OData Feed. As you can see, there are a lot of options. You can check out the offering here: http://www.microsoft.com/en-us/sqlazurelabs/labs/data-hub.aspx and you can read the documentation here: http://msdn.microsoft.com/en-us/library/hh921938

Read the article
mysql master-slave setup with synchronous replication

- by imaginative

I have a very trivial mysql master-slave setup going on between two servers. The problem is, replication is asynchronous, and this can cause issues (even on a low latency link), if the master server was to crash after a COMMIT before the replication thread from the slave was able to fetch the last bin log. Is there anyway to force mysql to do synchronous commits so that data consistency is guaranteed in a mysql-slave relationship?

Read the article
Apache Derby master/slave replication and clustering

- by Luke

I'm interested in the possibilities of master/slave replication for Derby in client/server mode (if at all possible). However, I'm unable to find any material that either explains it in a decent fashion or is able to convince me that master/slave replication doesn't exist for Derby. Any pointers to decent reading material are very much appreciated.

Read the article
Installing Replication on SQL 2005 SP3

- by Steve Syfuhs

I think I screwed myself on the setup of SQL 2005. I installed Database Services, but forgot to install replication. I then installed SP3. Is it possible to install replication after sp3? Or do I need to start from scratch?

Read the article

Search Results

Search found 59041 results on 2362 pages for 'data replication'.

Page 5/2362 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Mat Keep

- by user118136

- by dmch2

- by Ravi Kanneganti

- by ThaKidd

- by enashnash

- by pinaldave

- by Andrew

- by Dejan Sarka

- by Tanu Sood

- by Davide Mauri

- by dain.hansen

- by gombala

- by user53864

- by Doug Reid

- by thegreeneman

- by jgelhaus

- by ced

- by pinaldave

- by BuckWoody

- by BuckWoody

- by imaginative

- by Luke

- by Steve Syfuhs

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >