Search Results

Search found 84 results on 4 pages for 'sharding'.

Page 2/4 | < Previous Page | 1 2 3 4 | Next Page >

Efficient way to combine results of two database queries.

- by ensnare

I have two tables on different servers, and I'd like some help finding an efficient way to combine and match the datasets. Here's an example: From server 1, which holds our stories, I perform a query like: query = """SELECT author_id, title, text FROM stories ORDER BY timestamp_created DESC LIMIT 10 """ results = DB.getAll(query) for i in range(len(results)): #Build a string of author_ids, e.g. '1314,4134,2624,2342' But, I'd like to fetch some info about each author_id from server 2: query = """SELECT id, avatar_url FROM members WHERE id IN (%s) """ values = (uid_list) results = DB.getAll(query, values) Now I need some way to combine these two queries so I have a dict that has the story as well as avatar_url and member_id. If this data were on one server, it would be a simple join that would look like: SELECT * FROM members, stories WHERE members.id = stories.author_id But since we store the data on multiple servers, this is not possible. What is the most efficient way to do this? Thanks.

Read the article
Running a sharded DB from a single machine

- by ming yeow

This sounds kinda dumb, but I have a sharded DB that I no longer think I need to run on 2 machines, and would like to run on one single machine instead. Any ideas on how that can potentially be done? There are lots of resources on how i can achieve the converse, but very little on how this can be done

Read the article
We failed trying database per custom installation. Plan to recover?

- by Fedyashev Nikita

There is a web application which is in production mode for 3 years or so by now. Historically, because of different reasons there was made a decision to use database-per customer installation. Now we came across the fact that now deployments are very slow. Should we ever consider moving all the databases back to single one to reduce environment complexity? Or is it too risky idea? The problem I see now is that it's very hard to merge these databases with saving referential integrity(primary keys of different database' tables can not be obviously differentiated). Databases are not that much big, so we don't have much benefits of reduced load by having multiple databases.

Read the article
SQL SERVER – Shard No More – An Innovative Look at Distributed Peer-to-peer SQL Database

- by pinaldave

There is no doubt that SQL databases play an important role in modern applications. In an ideal world, a single database can handle hundreds of incoming connections from multiple clients and scale to accommodate the related transactions. However the world is not ideal and databases are often a cause of major headaches when applications need to scale to accommodate more connections, transactions, or both. In order to overcome scaling issues, application developers often resort to administrative acrobatics, also known as database sharding. Sharding helps to improve application performance and throughput by splitting the database into two or more shards. Unfortunately, this practice also requires application developers to code transactional consistency into their applications. Getting transactional consistency across multiple SQL database shards can prove to be very difficult. Sharding requires developers to think about things like rollbacks, constraints, and referential integrity across tables within their applications when these types of concerns are best handled by the database. It also makes other common operations such as joins, searches, and memory management very difficult. In short, the very solution implemented to overcome throughput issues becomes a bottleneck in and of itself. What if database sharding was no longer required to scale your application? Let me explain. For the past several months I have been following and writing about NuoDB, a hot new SQL database technology out of Cambridge, MA. NuoDB is officially out of beta and they have recently released their first release candidate so I decided to dig into the database in a little more detail. Their architecture is very interesting and exciting because it completely eliminates the need to shard a database to achieve higher throughput. Each NuoDB database consists of at least three or more processes that enable a single database to run across multiple hosts. These processes include a Broker, a Transaction Engine and a Storage Manager. Brokers are responsible for connecting client applications to Transaction Engines and maintain a global view of the network to keep track of the multiple Transaction Engines available at any time. Transaction Engines are in-memory processes that client applications connect to for processing SQL transactions. Storage Managers are responsible for persisting data to disk and serving up records to the Transaction Managers if they don’t exist in memory. The secret to NuoDB’s approach to solving the sharding problem is that it is a truly distributed, peer-to-peer, SQL database. Each of its processes can be deployed across multiple hosts. When client applications need to connect to a Transaction Engine, the Broker will automatically route the request to the most available process. Since multiple Transaction Engines and Storage Managers running across multiple host machines represent a single logical database, you never have to resort to sharding to get the throughput your application requires. NuoDB is a new pioneer in the SQL database world. They are making database scalability simple by eliminating the need for acrobatics such as sharding, and they are also making general administration of the database simpler as well. Their distributed database appears to you as a user like a single SQL Server database. With their RC1 release they have also provided a web based administrative console that they call NuoConsole. This tool makes it extremely easy to deploy and manage NuoDB processes across one or multiple hosts with the click of a mouse button. See for yourself by downloading NuoDB here. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: CodeProject, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology Tagged: NuoDB

Read the article
Slides of my HOL on MySQL Cluster

- by user13819847

Hi!Thanks everyone who attended my hands-on lab on MySQL Cluster at MySQL Connect last Saturday.The following are the links for the slides, the HOL instructions, and the code examples.I'll try to summarize my HOL below.Aim of the HOL was to help attendees to familiarize with MySQL Cluster. In particular, by learning: the basics of MySQL Cluster Architecture the basics of MySQL Cluster Configuration and Administration how to start a new Cluster for evaluation purposes and how to connect to it We started by introducing MySQL Cluster. MySQL Cluster is a proven technology that today is successfully servicing the most performance-intensive workloads. MySQL Cluster is deployed across telecom networks and is powering mission-critical web applications. Without trading off use of commodity hardware, transactional consistency and use of complex queries, MySQL Cluster provides: Web Scalability (web-scale performance on both reads and writes) Carrier Grade Availability (99.999%) Developer Agility (freedom to use SQL or NoSQL access methods) MySQL Cluster implements: an Auto-Sharding, Multi-Master, Shared-nothing Architecture, where independent nodes can scale horizontally on commodity hardware with no shared disks, no shared memory, no single point of failure In the architecture of MySQL Cluster it is possible to find three types of nodes: management nodes: responsible for reading the configuration files, maintaining logs, and providing an interface to the administration of the entire cluster data nodes: where data and indexes are stored api nodes: provide the external connectivity (e.g. the NDB engine of the MySQL Server, APIs, Connectors) MySQL Cluster is recommended in the situations where: it is crucial to reduce service downtime, because this produces a heavy impact on business sharding the database to scale write performance higly impacts development of application (in MySQL Cluster the sharding is automatic and transparent to the application) there are real time needs there are unpredictable scalability demands it is important to have data-access flexibility (SQL & NoSQL) MySQL Cluster is available in two Editions: Community Edition (Open Source, freely downloadable from mysql.com) Carrier Grade Edition (Commercial Edition, can be downloaded from eDelivery for evaluation purposes) MySQL Carrier Grade Edition adds on the top of the Community Edition: Commercial Extensions (MySQL Cluster Manager, MySQL Enterprise Monitor, MySQL Cluster Installer) Oracle's Premium Support Services (largest team of MySQL experts backed by MySQL developers, forward compatible hot fixes, multi-language support, and more) We concluded talking about the MySQL Cluster vision: MySQL Cluster is the default database for anyone deploying rapidly evolving, realtime transactional services at web-scale, where downtime is simply not an option. From a practical point of view the HOL's steps were: MySQL Cluster installation start & monitoring of the MySQL Cluster processes client connection to the Management Server and to an SQL Node connection using the NoSQL NDB API and the Connector J In the hope that this blog post can help you get started with MySQL Cluster, I take the opportunity to thank you for the questions you made both during the HOL and at the MySQL Cluster booth. Slides are also on SlideShares: Santo Leto - MySQL Connect 2012 - Getting Started with Mysql Cluster Happy Clustering!

Read the article
Very large database, very small portion most being retrieved in real time

- by mingyeow

Hi folks, I have an interesting database problem. I have a DB that is 150GB in size. My memory buffer is 8GB. Most of my data is rarely being retrieved, or mainly being retrieved by backend processes. I would very much prefer to keep them around because some features require them. Some of it (namely some tables, and some identifiable parts of certain tables) are used very often in a user facing manner How can I make sure that the latter is always being kept in memory? (there is more than enough space for these) More info: We are on Ruby on rails. The database is MYSQL, our tables are stored using INNODB. We are sharding the data across 2 partitions. Because we are sharding it, we store most of our data using JSON blobs, while indexing only the primary keys

Read the article
Very large database, very small portion most being retrieved in real time

- by ming yeow

Hi folks, I have an interesting database problem. I have a DB that is 150GB in size. My memory buffer is 8GB. Most of my data is rarely being retrieved, or mainly being retrieved by backend processes. I would very much prefer to keep them around because some features require them. Some of it (namely some tables, and some identifiable parts of certain tables) are used very often in a user facing manner How can I make sure that the latter is always being kept in memory? (there is more than enough space for these) More info: We are on Ruby on rails. The database is MYSQL, our tables are stored using INNODB. We are sharding the data across 2 partitions. Because we are sharding it, we store most of our data using JSON blobs, while indexing only the primary keys

Read the article
Mongodb using db.help() on a particular db command

- by user1325696

When I type db.help() It returns DB methods: db.addUser(username, password[, readOnly=false]) db.auth(username, password) ... ... db.printShardingStatus() ... ... db.fsyncLock() flush data to disk and lock server for backups db.fsyncUnock() unlocks server following a db.fsyncLock() I'd like to find out how to get more detailed help for the particular command. The problem was with the printShardingStatus as it returned "too many chunks to print, use verbose if you want to print" mongos> db.printShardingStatus() --- Sharding Status --- sharding version: { "_id" : 1, "version" : 3 } shards: { "_id" : "shard0000", "host" : "localhost:10001" } { "_id" : "shard0001", "host" : "localhost:10002" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "dbTest", "partitioned" : true, "primary" : "shard0000" } dbTest.things chunks: shard0001 12 shard0000 19 too many chunks to print, use verbose if you want to for ce print I found that for that particular command I can specify boolean parameter db.printShardingStatus(true) which wasn't shown using db.help().

Read the article
Scripting Language Sessions at Oracle OpenWorld and MySQL Connect, 2012

- by cj

This posts highlights some great scripting language sessions coming up at the Oracle OpenWorld and MySQL Connect conferences. These events are happening in San Francisco from the end of September. You can search for other interesting conference sessions in the Content Catalog. Also check out what is happening at JavaOne in that event's Content Catalog (I haven't included sessions from it in this post.) To find the timeslots and locations of each session, click their respective link and check the "Session Schedule" box on the top right. GEN8431 - General Session: What’s New in Oracle Database Application Development This general session takes a look at what’s been new in the last year in Oracle Database application development tools using the latest generation of database technology. Topics range from Oracle SQL Developer and Oracle Application Express to Java and PHP. (Thomas Kyte - Architect, Oracle) BOF9858 - Meet the Developers of Database Access Services (OCI, ODBC, DRCP, PHP, Python) This session is your opportunity to meet in person the Oracle developers who have built Oracle Database access tools and products such as the Oracle Call Interface (OCI), Oracle C++ Call Interface (OCCI), and Open Database Connectivity (ODBC) drivers; Transparent Application Failover (TAF); Oracle Database Instant Client; Database Resident Connection Pool (DRCP); Oracle Net Services, and so on. The team also works with those who develop the PHP, Ruby, Python, and Perl adapters for Oracle Database. Come discuss with them the features you like, your pains, and new product enhancements in the latest database technology. CON8506 - Syndication and Consolidation: Oracle Database Driver for MySQL Applications This technical session presents a new Oracle Database driver that enables you to run MySQL applications (written in PHP, Perl, C, C++, and so on) against Oracle Database with almost no code change. Use cases for such a driver include application syndication such as interoperability across a relationship database management system, application migration, and database consolidation. In addition, the session covers enhancements in database technology that enable and simplify the migration of third-party databases and applications to and consolidation with Oracle Database. Attend this session to learn more and see a live demo. (Srinath Krishnaswamy - Director, Software Development, Oracle. Kuassi Mensah - Director Product Management, Oracle. Mohammad Lari - Principal Technical Staff, Oracle ) CON9167 - Current State of PHP and MySQL Together, PHP and MySQL power large parts of the Web. The developers of both technologies continue to enhance their software to ensure that developers can be satisfied despite all their changing and growing needs. This session presents an overview of changes in PHP 5.4, which was released earlier this year and shows you various new MySQL-related features available for PHP, from transparent client-side caching to direct support for scaling and high-availability needs. (Johannes Schlüter - SoftwareDeveloper, Oracle) CON8983 - Sharding with PHP and MySQL In deploying MySQL, scale-out techniques can be used to scale out reads, but for scaling out writes, other techniques have to be used. To distribute writes over a cluster, it is necessary to shard the database and store the shards on separate servers. This session provides a brief introduction to traditional MySQL scale-out techniques in preparation for a discussion on the different sharding techniques that can be used with MySQL server and how they can be implemented with PHP. You will learn about static and dynamic sharding schemes, their advantages and drawbacks, techniques for locating and moving shards, and techniques for resharding. (Mats Kindahl - Senior Principal Software Developer, Oracle) CON9268 - Developing Python Applications with MySQL Utilities and MySQL Connector/Python This session discusses MySQL Connector/Python and the MySQL Utilities component of MySQL Workbench and explains how to write MySQL applications in Python. It includes in-depth explanations of the features of MySQL Connector/Python and the MySQL Utilities library, along with example code to illustrate the concepts. Those interested in learning how to expand or build their own utilities and connector features will benefit from the tips and tricks from the experts. This session also provides an opportunity to meet directly with the engineers and provide feedback on your issues and priorities. You can learn what exists today and influence future developments. (Geert Vanderkelen - Software Developer, Oracle) BOF9141 - MySQL Utilities and MySQL Connector/Python: Python Developers, Unite! Come to this lively discussion of the MySQL Utilities component of MySQL Workbench and MySQL Connector/Python. It includes in-depth explanations of the features and dives into the code for those interested in learning how to expand or build their own utilities and connector features. This is an audience-driven session, so put on your best Python shirt and let’s talk about MySQL Utilities and MySQL Connector/Python. (Geert Vanderkelen - Software Developer, Oracle. Charles Bell - Senior Software Developer, Oracle) CON3290 - Integrating Oracle Database with a Social Network Facebook, Flickr, YouTube, Google Maps. There are many social network sites, each with their own APIs for sharing data with them. Most developers do not realize that Oracle Database has base tools for communicating with these sites, enabling all manner of information, including multimedia, to be passed back and forth between the sites. This technical presentation goes through the methods in PL/SQL for connecting to, and then sending and retrieving, all types of data between these sites. (Marcelle Kratochvil - CTO, Piction) CON3291 - Storing and Tuning Unstructured Data and Multimedia in Oracle Database Database administrators need to learn new skills and techniques when the decision is made in their organization to let Oracle Database manage its unstructured data. They will face new scalability challenges. A single row in a table can become larger than a whole database. This presentation covers the techniques a DBA needs for managing the large volume of data in a standard Oracle Database instance. (Marcelle Kratochvil - CTO, Piction) CON3292 - Using PHP, Perl, Visual Basic, Ruby, and Python for Multimedia in Oracle Database These five programming languages are just some of the most popular ones in use at the moment in the marketplace. This presentation details how you can use them to access and retrieve multimedia from Oracle Database. It covers programming techniques and methods for achieving faster development against Oracle Database. (Marcelle Kratochvil - CTO, Piction) UGF5181 - Building Real-World Oracle DBA Tools in Perl Perl is not normally associated with building mission-critical application or DBA tools. Learn why Perl could be a good choice for building your next killer DBA app. This session draws on real-world experience of building DBA tools in Perl, showing the framework and architecture needed to deal with portability, efficiency, and maintainability. Topics include Perl frameworks; Which Comprehensive Perl Archive Network (CPAN) modules are good to use; Perl and CPAN module licensing; Perl and Oracle connectivity; Compiling and deploying your app; An example of what is possible with Perl. (Arjen Visser - CEO & CTO, Dbvisit Software Limited) CON3153 - Perl: A DBA’s and Developer’s Best (Forgotten) Friend This session reintroduces Perl as a language of choice for many solutions for DBAs and developers. Discover what makes Perl so successful and why it is so versatile in our day-to-day lives. Perl can automate all those manual tasks and is truly platform-independent. Perl may not be in the limelight the way other languages are, but it is a remarkable language, it is still very current with ongoing development, and it has amazing online resources. Learn what makes Perl so great (including CPAN), get an introduction to Perl language syntax, find out what you can use Perl for, hear how Oracle uses Perl, discover the best way to learn Perl, and take away a small Perl project challenge. (Arjen Visser - CEO & CTO, Dbvisit Software Limited) CON10332 - Oracle RightNow CX Cloud Service’s Connect PHP API: Intro, What’s New, and Roadmap Connect PHP is a public API that enables developers to build solutions with the Oracle RightNow CX Cloud Service platform. This API is used primarily by developers working within the Oracle RightNow Customer Portal Cloud Service framework who are looking to gain access to data and services hosted by the Oracle RightNow CX Cloud Service platform through a backward-compatible API. Connect for PHP leverages the same data model and services as the Connect Web Services for SOAP API. Come to this session to get an introduction and learn what’s new and what’s coming up. (Mark Rhoads - Senior Principal Applications Engineer, Oracle. Mark Ericson - Sr. Principle Product Manager, Oracle) CON10330 - Oracle RightNow CX Cloud Service APIs and Frameworks Overview Oracle RightNow CX Cloud Service APIs are available in the following areas: desktop UI, Web services, customer portal, PHP, and knowledge. These frameworks provide access to Oracle RightNow CX Cloud Service’s Connect Common Object Model and custom objects. This session provides a broad overview of capabilities in all these areas. (Mark Ericson - Sr. Principle Product Manager, Oracle)

Read the article
MySQL User Camp - Bangalore, India

- by Lenka Kasparova

Another MySQL User Group meeting called "MySQL User Camp-Bangalore" is announced for Jun 20 in Bangalore, India!! Please find more details below: Date and time: June 20, 2014 at 3PM IST Place: Bangalore, Kalyani Magnum campus Registration: Registration is needed, please contact [email protected] URL Agenda: MySQL 5.7 New Features and NoSQL support in MySQL Sharding as implemented in MySQL Fabric Open discussion with MySQL developers We are looking forward to seeing you on Jun 20!!

Read the article
SQL Azure Federation – Partitioning

- by simonsabin

There has been so much news coming out of MS lately and one that does seem to have gone by with very little noise is Federation in SQL Azure http://player.microsoftpdc.com/Session/591d586f-3732-4bff-8ee2-857f27d74df4 This is a fascinating feature that enables you to spread a database across multiple nodes. Sharding is another term for this and is one of the main reasons people like the NOSQL movement. It will be fascinating to see whether this federation will start to appear in the main SQL Server...(read more)

Read the article
MySQL: Functional Partitioning

This article contains common different methods of functional partitioning and common considerations for database setup and capacity. Company DBAs, database developers, engineers and architects should consider the pros and cons of any method of sharding or partitioning since compromises will have to be made given the pros and cons of a system setup.

Read the article
MySQL: Functional Partitioning

This article contains common different methods of functional partitioning and common considerations for database setup and capacity. Company DBAs, database developers, engineers and architects should consider the pros and cons of any method of sharding or partitioning since compromises will have to be made given the pros and cons of a system setup.

Read the article
Outgrew MongoDB … now what?

- by samsmith

We dump debug and transaction logs into mongodb. We really like mongodb because: Blazing insert perf document oriented Ability to let the engine drop inserts when needed for performance But there is this big problem with mongodb: The index must fit in physical RAM. In practice, this limits us to 80-150gb of raw data (we currently run on a system with 16gb RAM). Sooooo, for us to have 500gb or a tb of data, we would need 50gb or 80gb of RAM. Yes, I know this is possible. We can add servers and use mongo sharding. We can buy a special server box that can take 100 or 200 gb of RAM, but this is the tail wagging the dog! We could spend boucoup $$$ on hardware to run FOSS, when SQL Server Express can handle WAY more data on WAY less hardware than Mongo (SQL Server does not meet our architectural desires, or we would use it!) We are not going to spend huge $ on hardware here, because it is necessary only because of the Mongo architecture, not because of the inherent processing/storage needs. (And sharding? Please! Cost aside, who needs the ongoing complexity of three, five, or more servers to manage a relatively small load?) Bottom line: MongoDB is FOSS, but we gotta spend $$$$$$$ on hardware to run it? We sould rather buy commercial SW! I am sure we are not the first to hit this issue, so we ask the community: Where do we go next? (We already run Mongo v2) Thanks!!

Read the article
What is the standard system architecture for MongoDB

- by learner

I know this question is too vague, so I would like to add some key numbers to give insights about what the scenario is Each Document size - 360KB Total Documents - 1.5 million Document created/day - 2k read intensive - YES Availability requirement - HIGH With these requirements in mind, here is what I believe should be the architecture, but not too sure, please share your experiences and point me to right directions 2 Linux Box(Ubuntu 11 would do)(on a different rack setup for availability) 64-bit Mongo Database 1-master(for read/wr1te) and 1-slave(read-only with replication ON) Sharding not needed at this point in time Thank you in advance

Read the article
NoSQL Java API for MySQL Cluster: Questions & Answers

- by Mat Keep

The MySQL Cluster engineering team recently ran a live webinar, available now on-demand demonstrating the ClusterJ and ClusterJPA NoSQL APIs for MySQL Cluster, and how these can be used in building real-time, high scale Java-based services that require continuous availability. Attendees asked a number of great questions during the webinar, and I thought it would be useful to share those here, so others are also able to learn more about the Java NoSQL APIs. First, a little bit about why we developed these APIs and why they are interesting to Java developers. ClusterJ and Cluster JPA ClusterJ is a Java interface to MySQL Cluster that provides either a static or dynamic domain object model, similar to the data model used by JDO, JPA, and Hibernate. A simple API gives users extremely high performance for common operations: insert, delete, update, and query. ClusterJPA works with ClusterJ to extend functionality, including - Persistent classes - Relationships - Joins in queries - Lazy loading - Table and index creation from object model By eliminating data transformations via SQL, users get lower data access latency and higher throughput. In addition, Java developers have a more natural programming method to directly manage their data, with a complete, feature-rich solution for Object/Relational Mapping. As a result, the development of Java applications is simplified with faster development cycles resulting in accelerated time to market for new services. MySQL Cluster offers multiple NoSQL APIs alongside Java: - Memcached for a persistent, high performance, write-scalable Key/Value store, - HTTP/REST via an Apache module - C++ via the NDB API for the lowest absolute latency. Developers can use SQL as well as NoSQL APIs for access to the same data set via multiple query patterns – from simple Primary Key lookups or inserts to complex cross-shard JOINs using Adaptive Query Localization Marrying NoSQL and SQL access to an ACID-compliant database offers developers a number of benefits. MySQL Cluster’s distributed, shared-nothing architecture with auto-sharding and real time performance makes it a great fit for workloads requiring high volume OLTP. Users also get the added flexibility of being able to run real-time analytics across the same OLTP data set for real-time business insight. OK – hopefully you now have a better idea of why ClusterJ and JPA are available. Now, for the Q&A. Q & A Q. Why would I use Connector/J vs. ClusterJ? A. Partly it's a question of whether you prefer to work with SQL (Connector/J) or objects (ClusterJ). Performance of ClusterJ will be better as there is no need to pass through the MySQL Server. A ClusterJ operation can only act on a single table (e.g. no joins) - ClusterJPA extends that capability Q. Can I mix different APIs (ie ClusterJ, Connector/J) in our application for different query types? A. Yes. You can mix and match all of the API types, SQL, JDBC, ODBC, ClusterJ, Memcached, REST, C++. They all access the exact same data in the data nodes. Update through one API and new data is instantly visible to all of the others. Q. How many TCP connections would a SessionFactory instance create for a cluster of 8 data nodes? A. SessionFactory has a connection to the mgmd (management node) but otherwise is just a vehicle to create Sessions. Without using connection pooling, a SessionFactory will have one connection open with each data node. Using optional connection pooling allows multiple connections from the SessionFactory to increase throughput. Q. Can you give details of how Cluster J optimizes sharding to enhance performance of distributed query processing? A. Each data node in a cluster runs a Transaction Coordinator (TC), which begins and ends the transaction, but also serves as a resource to operate on the result rows. While an API node (such as a ClusterJ process) can send queries to any TC/data node, there are performance gains if the TC is where most of the result data is stored. ClusterJ computes the shard (partition) key to choose the data node where the row resides as the TC. Q. What happens if we perform two primary key lookups within the same transaction? Are they sent to the data node in one transaction? A. ClusterJ will send identical PK lookups to the same data node. Q. How is distributed query processing handled by MySQL Cluster ? A. If the data is split between data nodes then all of the information will be transparently combined and passed back to the application. The session will connect to a data node - typically by hashing the primary key - which then interacts with its neighboring nodes to collect the data needed to fulfil the query. Q. Can I use Foreign Keys with MySQL Cluster A. Support for Foreign Keys is included in the MySQL Cluster 7.3 Early Access release Summary The NoSQL Java APIs are packaged with MySQL Cluster, available for download here so feel free to take them for a spin today! Key Resources MySQL Cluster on-line demo MySQL ClusterJ and JPA On-demand webinar MySQL ClusterJ and JPA documentation MySQL ClusterJ and JPA whitepaper and tutorial

Read the article
Best way for a technical manager to stay up to date on technology

- by JoelFan

My manager asked for a list of technical blogs he should follow to stay current on technology. His problem is he keeps hearing terms that he hasn't heard of (i.e. NoSql, sharding, agure, sevice bus, etc.) and he would prefer to at least have a fighting chance of knowing something about them without having to be reactive and looking them up. Also I think he wants to have a big picture of all the emerging technologies and where they fit in together instead of just learning about each thing in isolation. He asked about blogs but I'm thinking print magazines may also help.

Read the article
Best way for a technical manager to stay up to date on technology

- by JoelFan

My manager asked for a list of technical blogs he should follow to stay current on technology. His problem is he keeps hearing terms that he hasn't heard of (i.e. NoSql, sharding, agure, sevice bus, etc.) and he would prefer to at least have a fighting chance of knowing something about them without having to be reactive and looking them up. Also I think he wants to have a big picture of all the emerging technologies and where they fit in together instead of just learning about each thing in isolation. He asked about blogs but I'm thinking print magazines may also help. What should I answer him?

Read the article
Interview question: How would you implement Google Search?

- by ripper234

Supposed you were asked in an interview "How would you implement Google Search?" How would you answer such a question? There might be resources out there that explain how some pieces in Google are implemented (BigTable, MapReduce, PageRank, ...), but that doesn't exactly fit in an interview. What overall architecture would you use, and how would you explain this in a 15-30 minute time span? I would start with explaining how to build a search engine that handles ~ 100k documents, then expand this via sharding to around 50M docs, then perhaps another architectural/technical leap. This is the 20,000 feet view. What I'd like is the details - how you would actually answer that in an interview. Which data structures would you use. What services/machines is your architecture composed of. What would a typical query latency be? What about failover / split brain issues? Etc...

Read the article
Example of moving from MySQL to NoSQL?

- by OverTheRainbow

Hello, For a Facebook-like site, ie. which is write-intensive and delivers user-customized pages, I'd like to build a prototype to investigate whether the document-centric NoSQL architecture would be a good alternative to sharding and reduce the load on the single master (+ multiple slaves) that we currently use and is the bottleneck. Does someone know of a good article that would give actual, simple examples of going from a relational layout in MySQL to a NoSQL layout? Thank you.

Read the article
MySQL – Scalability on Amazon RDS: Scale out to multiple RDS instances

- by Pinal Dave

Today, I’d like to discuss getting better MySQL scalability on Amazon RDS. The question of the day: “What can you do when a MySQL database needs to scale write-intensive workloads beyond the capabilities of the largest available machine on Amazon RDS?” Let’s take a look. In a typical EC2/RDS set-up, users connect to app servers from their mobile devices and tablets, computers, browsers, etc. Then app servers connect to an RDS instance (web/cloud services) and in some cases they might leverage some read-only replicas. Figure 1. A typical RDS instance is a single-instance database, with read replicas. This is not very good at handling high write-based throughput. As your application becomes more popular you can expect an increasing number of users, more transactions, and more accumulated data. User interactions can become more challenging as the application adds more sophisticated capabilities. The result of all this positive activity: your MySQL database will inevitably begin to experience scalability pressures. What can you do? Broadly speaking, there are four options available to improve MySQL scalability on RDS. 1. Larger RDS Instances – If you’re not already using the maximum available RDS instance, you can always scale up – to larger hardware. Bigger CPUs, more compute power, more memory et cetera. But the largest available RDS instance is still limited. And they get expensive. “High-Memory Quadruple Extra Large DB Instance”: 68 GB of memory 26 ECUs (8 virtual cores with 3.25 ECUs each) 64-bit platform High I/O Capacity Provisioned IOPS Optimized: 1000Mbps 2. Provisioned IOPs – You can get provisioned IOPs and higher throughput on the I/O level. However, there is a hard limit with a maximum instance size and maximum number of provisioned IOPs you can buy from Amazon and you simply cannot scale beyond these hardware specifications. 3. Leverage Read Replicas – If your application permits, you can leverage read replicas to offload some reads from the master databases. But there are a limited number of replicas you can utilize and Amazon generally requires some modifications to your existing application. And read-replicas don’t help with write-intensive applications. 4. Multiple Database Instances – Amazon offers a fourth option: “You can implement partitioning,thereby spreading your data across multiple database Instances” (Link) However, Amazon does not offer any guidance or facilities to help you with this. “Multiple database instances” is not an RDS feature. And Amazon doesn’t explain how to implement this idea. In fact, when asked, this is the response on an Amazon forum: Q: Is there any documents that describe the partition DB across multiple RDS? I need to use DB with more 1TB but exist a limitation during the create process, but I read in the any FAQ that you need to partition database, but I don’t find any documents that describe it. A: “DB partitioning/sharding is not an official feature of Amazon RDS or MySQL, but a technique to scale out database by using multiple database instances. The appropriate way to split data depends on the characteristics of the application or data set. Therefore, there is no concrete and specific guidance.” So now what? The answer is to scale out with ScaleBase. Amazon RDS with ScaleBase: What you get – MySQL Scalability! ScaleBase is specifically designed to scale out a single MySQL RDS instance into multiple MySQL instances. Critically, this is accomplished with no changes to your application code. Your application continues to “see” one database. ScaleBase does all the work of managing and enforcing an optimized data distribution policy to create multiple MySQL instances. With ScaleBase, data distribution, transactions, concurrency control, and two-phase commit are all 100% transparent and 100% ACID-compliant, so applications, services and tooling continue to interact with your distributed RDS as if it were a single MySQL instance. The result: now you can cost-effectively leverage multiple MySQL RDS instance to scale out write-intensive workloads to an unlimited number of users, transactions, and data. Amazon RDS with ScaleBase: What you keep – Everything! And how does this change your Amazon environment? 1. Keep your application, unchanged – There is no change your application development life-cycle at all. You still use your existing development tools, frameworks and libraries. Application quality assurance and testing cycles stay the same. And, critically, you stay with an ACID-compliant MySQL environment. 2. Keep your RDS value-added services – The value-added services that you rely on are all still available. Amazon will continue to handle database maintenance and updates for you. You can still leverage High Availability via Multi A-Z. And, if it benefits youra application throughput, you can still use read replicas. 3. Keep your RDS administration – Finally the RDS monitoring and provisioning tools you rely on still work as they did before. With your one large MySQL instance, now split into multiple instances, you can actually use less expensive, smallersmaller available RDS hardware and continue to see better database performance. Conclusion Amazon RDS is a tremendous service, but it doesn’t offer solutions to scale beyond a single MySQL instance. Larger RDS instances get more expensive. And when you max-out on the available hardware, you’re stuck. Amazon recommends scaling out your single instance into multiple instances for transaction-intensive apps, but offers no services or guidance to help you. This is where ScaleBase comes in to save the day. It gives you a simple and effective way to create multiple MySQL RDS instances, while removing all the complexities typically caused by “DIY” sharding andwith no changes to your applications . With ScaleBase you continue to leverage the AWS/RDS ecosystem: commodity hardware and value added services like read replicas, multi A-Z, maintenance/updates and administration with monitoring tools and provisioning. SCALEBASE ON AMAZON If you’re curious to try ScaleBase on Amazon, it can be found here – Download NOW. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: MySQL, PostADay, SQL, SQL Authority, SQL Optimization, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Cloud Computing - Multiple Physical Computers, One Logical Computer

- by Koobz

I know that you can set up multiple virtual machines per physical computer. I'm wondering if it's possible to make multiple physical computers behave as one logical unit? Fundamentally the way I imagine it working is that you can throw 10 computers into a facility one day. You've got one client that requires the equivalent of two computers worth, and 100 others that eat up the remaining 8. As demands change you're just reallocating logical resources, maybe the 2 computer client now requires a third physical system. You just add it to the cloud, and don't worry about sharding the database, or migrating data over to a new server. Can it work this way? If yes, why would anyone ever do things like partition their database servers anymore? Just add more computing resources. You scale horizontally with the hardware, but your server appears to scale vertically. There's no need to modify your application's infrastructure to support multiple databases etc.

Read the article
Cloud Computing - Multiple Physical Computers, One Logical Computer

- by bundini

I know that you can set up multiple virtual machines per physical computer. I'm wondering if it's possible to make multiple physical computers behave as one logical unit? Fundamentally the way I imagine it working is that you can throw 10 computers into a facility one day. You've got one client that requires the equivalent of two computers worth, and 100 others that eat up the remaining 8. As demands change you're just reallocating logical resources, maybe the 2 computer client now requires a third physical system. You just add it to the cloud, and don't worry about sharding the database, or migrating data over to a new server. Can it work this way? If yes, why would anyone ever do things like hand partition their database servers anymore? Just add more computing resources. You scale horizontally with the hardware, but your server appears to scale vertically. There's no need to modify your application's supporting infrastructure to support multiple databases etc.

Read the article
SQL Azure Federation - how much data before performance benefits?

- by Donald Hughes

To avoid premature optimization, I don't want to implement SQL Azure's Federation too early. Is there a rule of thumb for how much data a table would need to have before seeing performance benefits from sharding? I know there won't be a precise answer as there are too many variables to consider, especially with much of SQL Azure's resources being hidden/unknown. To put it into several, more concrete examples, would Federation improve performance in any of the below table scenarios: 100,000 rows (~ 200 MB) 1,000,000 rows (~ 2 GB) 10,000,000 rows (~ 20 GB) 100,000,000 rows (~ 200 GB) For the sake of elaboration, we can assume this is the largest table that would be federated, which consists of order details, which is joined to an orders table with a 'customer_id' foreign key, which would be the distribution key. This is a fairly standard multi-tenant, CRUD order entry system, with a typical assortment of reporting needs (customer order totals by day/month/year, etc).

Read the article
Best practice for scaling a single application source to multiple nodes

- by Andrew Waters

I have an application which needs to scale horizontally to cover web and service nodes (at the moment they're all on one) but interact with the same set of databases and source files (both application code and custom assets). Database is no problem, it's handled already with replication in MongoDB. Also, the configuration of the servers are the same (100% linux). This question is literally about sharing a filesystem between machines so that its content is always correct, regardless of the node accessing it. My two thoughts have so far been NFS and SAN - SAN being prohibitively expensive and NFS seeing some performance issues on the second node with regards to glob()ing in PHP. Does anyone have recommended strategies or other techniques that don't involved sharding data across nodes or any potential gotchas in NFS that may cause slow disk seek times? To give you an idea of the scale, the main node initialises it's application modules in ~ 0.01 seconds. The secondary is taking ~2.2 seconds. They're VM's inside a local virtual network in ESXi and ping time between them is ~0.3ms

Read the article

Search Results

Search found 84 results on 4 pages for 'sharding'.

Page 2/4 | < Previous Page | 1 2 3 4 | Next Page >

- by ensnare

- by ming yeow

- by Fedyashev Nikita

- by pinaldave

- by user13819847

- by mingyeow

- by ming yeow

- by user1325696

- by cj

- by Lenka Kasparova

- by simonsabin

- by samsmith

- by learner

- by Mat Keep

- by JoelFan

- by JoelFan

- by ripper234

- by OverTheRainbow

- by Pinal Dave

- by Koobz

- by bundini

- by Donald Hughes

- by Andrew Waters

< Previous Page | 1 2 3 4 | Next Page >