Search Results

Search found 28 results on 2 pages for 'riak'.

Page 1/2 | 1 2 | Next Page >

Riak "error":"insufficient_vnodes_available"

- by Wolfiem

We have 4 nodes Riak installation. They are running on Ubuntu 12.04 LTS Precise installed servers. We have installed 1.1.4 at August 1st 2012 and upgraded 1.2.0 when its available. Server names are: f1 - 10.10.0.12 - This is the first installed server. We have joined other ones to this server. This also serves Riak control. s2 - 10.10.0.22 - s3 - 10.10.0.23 - s4 - 10.10.0.24 - This server also serves Riak control. This morning we've seen "insufficient nodes available" error at our applications log and restarted all nodes. 3 of them became available except "f1" UPDATE : while I prepare this message live 3 nodes became unavailable and need restart Riak. wolfiem@f01:~$ sudo /etc/init.d/riak start Riak failed to start within 15 seconds, see the output of 'riak console' for more information. If you want to wait longer, set the environment variable WAIT_FOR_ERLANG to the number of seconds to wait. I've tried to set WAIT_FOR_ERLANG value to 60 seconds but I can't. adding this line in vm.args didn't work: -env WAIT_FOR_ERLANG 60 I also tried to set this from terminal but it didn't work either. wolfiem@f01:~$ export WAIT_FOR_ERLANG=60 It still says "Riak failed to start within 15 seconds" This is the console.log output: 2012-09-11 10:58:02.532 [info] <0.7.0> Application lager started on node '[email protected]' 2012-09-11 10:58:02.560 [warning] <0.148.0>@riak_core_ring_manager:reload_ring:231 No ring file available. 2012-09-11 10:58:02.585 [error] <0.164.0> CRASH REPORT Process <0.164.0> with 0 neighbours exited with reason: eaddrnotavail in gen_server:init_it/6 line 320 This is the error.log output 2012-09-11 10:58:02.585 [error] <0.164.0> CRASH REPORT Process <0.164.0> with 0 neighbours exited with reason: eaddrnotavail in gen_server:init_it/6 line 320 This is the crash.log output: 2012-09-11 10:58:02 =CRASH REPORT==== crasher: initial call: mochiweb_socket_server:init/1 pid: <0.164.0> registered_name: [] exception exit: {eaddrnotavail,[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,320}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]} ancestors: [riak_core_sup,<0.135.0>] messages: [] links: [<0.136.0>] dictionary: [] trap_exit: true status: running heap_size: 377 stack_size: 24 reductions: 403 neighbours: You can find the riak console output below: wolfiem@f01:~$ riak console Attempting to restart script through sudo -H -u riak Exec: /usr/lib/riak/erts-5.9.1/bin/erlexec -boot /usr/lib/riak/releases/1.2.0/riak -embedded -config /etc/riak/app.config -pa /usr/lib/riak/basho-patches -args_file /etc/riak/vm.args -- console Root: /usr/lib/riak Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:8:8] [async-threads:64] [kernel-poll:true] =INFO REPORT==== 11-Sep-2012::10:44:18 === alarm_handler: {set,{system_memory_high_watermark,[]}} ** /usr/lib/riak/lib/observer-1.1/ebin/etop_txt.beam hides /usr/lib/riak/lib/basho-patches/etop_txt.beam ** Found 1 name clashes in code paths 10:44:19.099 [info] Application lager started on node '[email protected]' 10:44:19.130 [warning] No ring file available. 10:44:19.158 [error] CRASH REPORT Process <0.164.0> with 0 neighbours exited with reason: eaddrnotavail in gen_server:init_it/6 line 320 /usr/lib/riak/lib/os_mon-2.2.9/priv/bin/memsup: Erlang has closed. =INFO REPORT==== 11-Sep-2012::10:44:19 === alarm_handler: {clear,system_memory_high_watermark} Erlang has closed {"Kernel pid terminated",application_controller,"{application_start_failure,riak_core,{shutdown,{riak_core_app,start,[normal,[]]}}}"} Crash dump was written to: /var/log/riak/erl_crash.dump Kernel pid terminated (application_controller) ({application_start_failure,riak_core,{shutdown,{riak_core_app,start,[normal,[]]}}})

Read the article
How to store an object in Riak with the Java client?

- by Jonas

I have setup Riak on a Ubuntu machine, and it seam to work if I do riak ping. Now I would like to use the Riak Java client to store an object, but it doesn't work. I get com.basho.riak.client.response.RiakIORuntimeException when I try to store an object. What am I doing wrong? Is there a way to test if I can access riak from my java client? Do I have to create a Bucket first? how? import com.basho.riak.client.RiakClient; import com.basho.riak.client.RiakObject; import com.basho.riak.client.response.FetchResponse; public class RiakTest { public static void main(String[] args) { // connect RiakClient riak = new RiakClient("http://192.168.1.107:8098/riak"); // create object RiakObject o = new RiakObject("mybucket", "mykey", "myvalue"); // store riak.store(o); } }

Read the article
PHP Riak in place update

- by WojonsTech

From what I can see, when using Riak to update an object, I first need to load the object into PHP, then edit the object, then store the object back to the Riak database. I was wondering if there is a way to update a bucket without pulling it into PHP first. That way, it would save on the network I/O and latency of pulling it into the PHP script. Can objects be edited directly on the Riak side of things? Edit: Is there away to push data to the end of a raik object, so if i have an object that is numeric array can i make a push to add subject that i know its not there or no in place updates what so ever

Read the article
Mapreduce with Riak

- by Zubair

Does anyone have example code for mapreduce for Riak that can be run on a single Riak node.

Read the article
has anyone got Riak working on Solaris or OpenSolaris?

- by Zubair

has anyone got Riak working on Solaris or OpenSolaris? When I try to compile it I get: user@opensolaris:~/riak# gmake all rel ./rebar compile /usr/bin/env: No such file or directory gmake: *** [compile] Error 127 user@opensolaris:~/riak# mkdir /usr/bin/env mkdir: Failed to make directory "/usr/bin/env"; File exists user@opensolaris:~/riak#

Read the article
Where are Riak Post-Commit Hooks run?

- by pixelcort

I'm trying to evaluate using Riak's Post-Commit Hooks to build a distributed, incremental MapReduce-based index, but was wondering which Riak nodes the Post-Commit Hooks actually run on. Are they run on the nodes the client used to put the commits, or on the primary nodes where the data is persisted? If it's the latter, I'm thinking I can from there efficiently do a map or reduce and put additional records from the output.

Read the article
Riak link-walking like a join?

- by tesmar

Hi guys! I am looking to store pictures in a NoSQL database (<5MB) and link them to articles in a different bucket. What kind of speed does Riak's link walking feature offer? Is it like a RDBMS join at all?

Read the article
Does anyone know how to install a single node instance of Riak?

- by Zubair

Does anyone know how to install a single node instance of Riak as the documentation is plain wrong and I have been trying for over a day now?

Read the article
If I have a Riak DB, should it store to Amazon S3 or to the EC/2 instance?

- by tesmar

I am setting up an Amazon EC/2 instance, and am putting Riak on it. I am wondering if I should store the data locally (and not ever delete the instance), or set up Riak to store to S3 and bring up/take down instances as I need?

Read the article
How fast are EC/2 nodes between each other?

- by tesmar

Hi, I am looking to setup Amazon EC/2 nodes on rails with Riak. I am looking to be able to sync the riak DBs and if the cluster gets a query, to be able to tell where the data lies and retrieve it quickly. In your opinion(s), is EC/2 fast enough between nodes to query a Riak DB, return the results, and get them back to the client in a timely manner? I am new to all of this, so please be kind :)

Read the article
Raik vs Amazon SimpleDB

- by Fedrick

Hi, I am looking for an eventually consistent key value data store and i decided to choose between Amazon SimpleDB and Riak ,so can anyone share their valuable experiences comparing both . Thanks in advance Fedrick

Read the article
Distributed datastore

- by Julien Genestoux

We're trying to add some kind of persistence in our app. The app generates about 250 entries per second. Each of these entries belong to one of 2M files. For each file, we want to keep the last 10 entries, so we can look them up later. The way our client application works : it gets a stream of all the data it fetches the right file (GET) it adds the new content it saves the file back (PUT) We're looking for an efficient way to store this data that can scale horizontally as the amount of data we're getting is doubling every few weeks. We initially looked at S3. It works fine, but becomes very expensive very fast ($1000 monthly just in PUT operations!) We then gave a shot at Riak. But it seems we can't get more than 60 write/sec on each node, which is very very slow. Any other solution out there?

Read the article
bigtable vs cassandra vs simpledb vs dynamo vs couchdb vs hypertable vs riak vs hbase, what do they

- by Mickey Shine

Sorry if this question is somewhat subjective. I am new to 'could store', 'distributed store' or some concepts like this. I really wonder what do they have in common and want to get an overview on all of them. What do I need to prepare if I want to write a product similar to this?

Read the article
Is Berkeley DB a NoSQL solution?

- by Gregory Burd

Berkeley DB is a library. To use it to store data you must link the library into your application. You can use most programming languages to access the API, the calls across these APIs generally mimic the Berkeley DB C-API which makes perfect sense because Berkeley DB is written in C. The inspiration for Berkeley DB was the DBM library, a part of the earliest versions of UNIX written by AT&T's Ken Thompson in 1979. DBM was a simple key/value hashtable-based storage library. In the early 1990s as BSD UNIX was transitioning from version 4.3 to 4.4 and retrofitting commercial code owned by AT&T with unencumbered code, it was the future founders of Sleepycat Software who wrote libdb (aka Berkeley DB) as the replacement for DBM. The problem it addressed was fast, reliable local key/value storage. At that time databases almost always lived on a single node, even the most sophisticated databases only had simple fail-over two node solutions. If you had a lot of data to store you would choose between the few commercial RDBMS solutions or to write your own custom solution. Berkeley DB took the headache out of the custom approach. These basic market forces inspired other DBM implementations. There was the "New DBM" (ndbm) and the "GNU DBM" (GDBM) and a few others, but the theme was the same. Even today TokyoCabinet calls itself "a modern implementation of DBM" mimicking, and improving on, something first created over thirty years ago. In the mid-1990s, DBM was the name for what you needed if you were looking for fast, reliable local storage. Fast forward to today. What's changed? Systems are connected over fast, very reliable networks. Disks are cheep, fast, and capable of storing huge amounts of data. CPUs continued to follow Moore's Law, processing power that filled a room in 1990 now fits in your pocket. PCs, servers, and other computers proliferated both in business and the personal markets. In addition to the new hardware entire markets, social systems, and new modes of interpersonal communication moved onto the web and started evolving rapidly. These changes cause a massive explosion of data and a need to analyze and understand that data. Taken together this resulted in an entirely different landscape for database storage, new solutions were needed. A number of novel solutions stepped up and eventually a category called NoSQL emerged. The new market forces inspired the CAP theorem and the heated debate of BASE vs. ACID. But in essence this was simply the market looking at what to trade off to meet these new demands. These new database systems shared many qualities in common. There were designed to address massive amounts of data, millions of requests per second, and scale out across multiple systems. The first large-scale and successful solution was Dynamo, Amazon's distributed key/value database. Dynamo essentially took the next logical step and added a twist. Dynamo was to be the database of record, it would be distributed, data would be partitioned across many nodes, and it would tolerate failure by avoiding single points of failure. Amazon did this because they recognized that the majority of the dynamic content they provided to customers visiting their web store front didn't require the services of an RDBMS. The queries were simple, key/value look-ups or simple range queries with only a few queries that required more complex joins. They set about to use relational technology only in places where it was the best solution for the task, places like accounting and order fulfillment, but not in the myriad of other situations. The success of Dynamo, and it's design, inspired the next generation of Non-SQL, distributed database solutions including Cassandra, Riak and Voldemort. The problem their designers set out to solve was, "reliability at massive scale" so the first focal point was distributed database algorithms. Underneath Dynamo there is a local transactional database; either Berkeley DB, Berkeley DB Java Edition, MySQL or an in-memory key/value data structure. Dynamo was an evolution of local key/value storage onto networks. Cassandra, Riak, and Voldemort all faced similar design decisions and one, Voldemort, choose Berkeley DB Java Edition for it's node-local storage. Riak at first was entirely in-memory, but has recently added write-once, append-only log-based on-disk storage similar type of storage as Berkeley DB except that it is based on a hash table which must reside entirely in-memory rather than a btree which can live in-memory or on disk. Berkeley DB evolved too, we added high availability (HA) and a replication manager that makes it easy to setup replica groups. Berkeley DB's replication doesn't partitioned the data, every node keeps an entire copy of the database. For consistency, there is a single node where writes are committed first - a master - then those changes are delivered to the replica nodes as log records. Applications can choose to wait until all nodes are consistent, or fire and forget allowing Berkeley DB to eventually become consistent. Berkeley DB's HA scales-out quite well for read-intensive applications and also effectively eliminates the central point of failure by allowing replica nodes to be elected (using a PAXOS algorithm) to mastership if the master should fail. This implementation covers a wide variety of use cases. MemcacheDB is a server that implements the Memcache network protocol but uses Berkeley DB for storage and HA to replicate the cache state across all the nodes in the cache group. Google Accounts, the user authentication layer for all Google properties, was until recently running Berkeley DB HA. That scaled to a globally distributed system. That said, most NoSQL solutions try to partition (shard) data across nodes in the replication group and some allow writes as well as reads at any node, Berkeley DB HA does not. So, is Berkeley DB a "NoSQL" solution? Not really, but it certainly is a component of many of the existing NoSQL solutions out there. Forgetting all the noise about how NoSQL solutions are complex distributed databases when you boil them down to a single node you still have to store the data to some form of stable local storage. DBMs solved that problem a long time ago. NoSQL has more to do with the layers on top of the DBM; the distributed, sometimes-consistent, partitioned, scale-out storage that manage key/value or document sets and generally have some form of simple HTTP/REST-style network API. Does Berkeley DB do that? Not really. Is Berkeley DB a "NoSQL" solution today? Nope, but it's the most robust solution on which to build such a system. Re-inventing the node-local data storage isn't easy. A lot of people are starting to come to appreciate the sophisticated features found in Berkeley DB, even mimic them in some cases. Could Berkeley DB grow into a NoSQL solution? Absolutely. Our key/value API could be extended over the net using any of a number of existing network protocols such as memcache or HTTP/REST. We could adapt our node-local data partitioning out over replicated nodes. We even have a nice query language and cost-based query optimizer in our BDB XML product that we could reuse were we to build out a document-based NoSQL-style product. XML and JSON are not so different that we couldn't adapt one to work with the other interchangeably. Without too much effort we could add what's missing, we could jump into this No SQL market withing a single product development cycle. Why isn't Berkeley DB already a NoSQL solution? Why aren't we working on it? Why indeed...

Read the article
Ideas for an Erlang Application [closed]

- by user1640228

I'm just about to finish an Erlang book and I've done plenty of hacking on trivial things outside of reading the book. Now I want to crank thinks up and build an app that really makes use of many of Erlang and OTP's big features. I've got a few sketches of a highly-available music delivery system backed up by a riak cluster. Would love some help to inspire my project and help me into designing the system the way a professional Erlanger would.

Read the article
Big Data – Operational Databases Supporting Big Data – Key-Value Pair Databases and Document Databases – Day 13 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Relational Database and NoSQL database in the Big Data Story. In this article we will understand the role of Key-Value Pair Databases and Document Databases Supporting Big Data Story. Now we will see a few of the examples of the operational databases. Relational Databases (Yesterday’s post) NoSQL Databases (Yesterday’s post) Key-Value Pair Databases (This post) Document Databases (This post) Columnar Databases (Tomorrow’s post) Graph Databases (Tomorrow’s post) Spatial Databases (Tomorrow’s post) Key Value Pair Databases Key Value Pair Databases are also known as KVP databases. A key is a field name and attribute, an identifier. The content of that field is its value, the data that is being identified and stored. They have a very simple implementation of NoSQL database concepts. They do not have schema hence they are very flexible as well as scalable. The disadvantages of Key Value Pair (KVP) database are that they do not follow ACID (Atomicity, Consistency, Isolation, Durability) properties. Additionally, it will require data architects to plan for data placement, replication as well as high availability. In KVP databases the data is stored as strings. Here is a simple example of how Key Value Database will look like: Key Value Name Pinal Dave Color Blue Twitter @pinaldave Name Nupur Dave Movie The Hero As the number of users grow in Key Value Pair databases it starts getting difficult to manage the entire database. As there is no specific schema or rules associated with the database, there are chances that database grows exponentially as well. It is very crucial to select the right Key Value Pair Database which offers an additional set of tools to manage the data and provides finer control over various business aspects of the same. Riak Rick is one of the most popular Key Value Database. It is known for its scalability and performance in high volume and velocity database. Additionally, it implements a mechanism for collection key and values which further helps to build manageable system. We will further discuss Riak in future blog posts. Key Value Databases are a good choice for social media, communities, caching layers for connecting other databases. In simpler words, whenever we required flexibility of the data storage keeping scalability in mind – KVP databases are good options to consider. Document Database There are two different kinds of document databases. 1) Full document Content (web pages, word docs etc) and 2) Storing Document Components for storage. The second types of the document database we are talking about over here. They use Javascript Object Notation (JSON) and Binary JSON for the structure of the documents. JSON is very easy to understand language and it is very easy to write for applications. There are two major structures of JSON used for Document Database – 1) Name Value Pairs and 2) Ordered List. MongoDB and CouchDB are two of the most popular Open Source NonRelational Document Database. MongoDB MongoDB databases are called collections. Each collection is build of documents and each document is composed of fields. MongoDB collections can be indexed for optimal performance. MongoDB ecosystem is highly available, supports query services as well as MapReduce. It is often used in high volume content management system. CouchDB CouchDB databases are composed of documents which consists fields and attachments (known as description). It supports ACID properties. The main attraction points of CouchDB are that it will continue to operate even though network connectivity is sketchy. Due to this nature CouchDB prefers local data storage. Document Database is a good choice of the database when users have to generate dynamic reports from elements which are changing very frequently. A good example of document usages is in real time analytics in social networking or content management system. Tomorrow In tomorrow’s blog post we will discuss about various other Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Which key value store is the most promising/stable?

- by Mike Trpcic

I'm looking to start using a key/value store for some side projects (mostly as a learning experience), but so many have popped up in the recent past that I've got no idea where to begin. Just listing from memory, I can think of: CouchDB MongoDB Riak Redis Tokyo Cabinet Berkeley DB Cassandra MemcacheDB And I'm sure that there are more out there that have slipped through my search efforts. With all the information out there, it's hard to find solid comparisons between all of the competitors. My criteria and questions are: (Most Important) Which do you recommend, and why? Which one is the fastest? Which one is the most stable? Which one is the easiest to set up and install? Which ones have bindings for Python and/or Ruby? Edit: So far it looks like Redis is the best solution, but that's only because I've gotten one solid response (from ardsrk). I'm looking for more answers like his, because they point me in the direction of useful, quantitative information. Which Key-Value store do you use, and why? Edit 2: If anyone has experience with CouchDB, Riak, or MongoDB, I'd love to hear your experiences with them (and even more so if you can offer a comparative analysis of several of them)

Read the article
Is there an open source database/key value for windows that allows me to dynamically add/remove node

- by Zubair

I'm looking for something like Riak, but which can run as a native windows application, and can handle failures, addition or removal of nodes, etc.

Read the article
How can I implement a list over an eventually consistent database?

- by Zubair

I want to implement lists and queues over Cassandra, Riak, or any other eventually consistent store. Is this possible and how could I do it?

Read the article
Why isn't DSM for unstructured memory done today?

- by sinned

Ages ago, Djikstra invented IPC through mutexes which then somehow led to shared memory (SHM) in multics (which afaik had the necessary mmap first). Then computer networks came up and DSM (distributed SHM) was invented for IPC between computers. So DSM is basically a not prestructured memory region (like a SHM) that magically get's synchronized between computers without the applications programmer taking action. Implementations include Treadmarks (inofficially dead now) and CRL. But then someone thought this is not the right way to do it and invented Linda & tuplespaces. Current implementations include JavaSpaces and GigaSpaces. Here, you have to structure your data into tuples. Other ways to achieve similar effects may be the use of a relational database or a key-value-store like RIAK. Although someone might argue, I don't consider them as DSM since there is no coherent memory region where you can put data structures in as you like but have to structure your data which can be hard if it is continuous and administration like locking can not be done for hard coded parts (=tuples, ...). Why is there no DSM implementation today or am I just unable to find one?

Read the article
How should I host our scalable worker processes?

- by Pieter Breed

We are designing a new architecture for an enterprise business. The principles we've followed so far is not to develop what you can (possible buy and) deploy, ie, don't reinvent any wheels. In this way we've decided on CQRS, RabbitMQ, Riak and a bunch of other things. We still need to write /some/ business code though and these will be in the form of worker processes, which will consume commands from a message queue and after any side-effects, produce events onto another message queue. The idea behind this is that via the competing-consumers design we will have a scalable design right out of the box. One option is of writing a management infrastructure that will know how to: deploy code instantiate processes kill processes update configuration etc IE provide fault tolerance and scalability. Also, this is exactly what something like GAE and Heroku does for you, but in a public setting and in our organization, public is bad. My question is, is there an out-of-the-box solution that we can use to host our consumers in? Like a private cloud or private platform-as-a-service. Private Heroku or GAE. Is there some kind of software or software product with which we can do all of these things and thereby get scalability and fault tolerance over our consumers?

Read the article
Big Data – Buzz Words: What is NoSQL – Day 5 of 21

- by Pinal Dave

In yesterday’s blog post we explored the basic architecture of Big Data . In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – NoSQL. What is NoSQL? NoSQL stands for Not Relational SQL or Not Only SQL. Lots of people think that NoSQL means there is No SQL, which is not true – they both sound same but the meaning is totally different. NoSQL does use SQL but it uses more than SQL to achieve its goal. As per Wikipedia’s NoSQL Database Definition – “A NoSQL database provides a mechanism for storage and retrieval of data that uses looser consistency models than traditional relational databases.“ Why use NoSQL? A traditional relation database usually deals with predictable structured data. Whereas as the world has moved forward with unstructured data we often see the limitations of the traditional relational database in dealing with them. For example, nowadays we have data in format of SMS, wave files, photos and video format. It is a bit difficult to manage them by using a traditional relational database. I often see people using BLOB filed to store such a data. BLOB can store the data but when we have to retrieve them or even process them the same BLOB is extremely slow in processing the unstructured data. A NoSQL database is the type of database that can handle unstructured, unorganized and unpredictable data that our business needs it. Along with the support to unstructured data, the other advantage of NoSQL Database is high performance and high availability. Eventual Consistency Additionally to note that NoSQL Database may not provided 100% ACID (Atomicity, Consistency, Isolation, Durability) compliance. Though, NoSQL Database does not support ACID they provide eventual consistency. That means over the long period of time all updates can be expected to propagate eventually through the system and data will be consistent. Taxonomy Taxonomy is the practice of classification of things or concepts and the principles. The NoSQL taxonomy supports column store, document store, key-value stores, and graph databases. We will discuss the taxonomy in detail in later blog posts. Here are few of the examples of the each of the No SQL Category. Column: Hbase, Cassandra, Accumulo Document: MongoDB, Couchbase, Raven Key-value : Dynamo, Riak, Azure, Redis, Cache, GT.m Graph: Neo4J, Allegro, Virtuoso, Bigdata As of now there are over 150 NoSQL Database and you can read everything about them in this single link. Tomorrow In tomorrow’s blog post we will discuss Buzz Word – Hadoop. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Hadoop, NOSQL, and the Relational Model

- by Phil Factor

(Guest Editorial for the IT Pro/SysAdmin Newsletter)Whereas Relational Databases fit the world of commerce like a glove, it is useless to pretend that they are a perfect fit for all human endeavours. Although, with SQL Server, we’ve made great strides with indexing text, in processing spatial data and processing markup, there is still a problem in dealing efficiently with large volumes of ephemeral semi-structured data. Key-value stores such as Cassandra, Project Voldemort, and Riak are of great value for ephemeral data, and seem of equal value as a data-feed that provides aggregations to an RDBMS. However, the Document databases such as MongoDB and CouchDB are ideal for semi-structured data for which no fixed schema exists; analytics and logging are obvious examples. NoSQL products, such as MongoDB, tackle the semi-structured data problem with panache. MongoDB is designed with a simple document-oriented data model that scales horizontally across multiple servers. It doesn’t impose a schema, and relies on the application to enforce the data structure. This is another take on the old ‘EAV’ problem (where you don’t know in advance all the attributes of a particular entity) It uses a clever replica set design that allows automatic failover, and uses journaling for data durability. It allows indexing and ad-hoc querying. However, for SQL Server users, the obvious choice for handling semi-structured data is Apache Hadoop. There will soon be an ODBC Driver for Apache Hive .and an Add-in for Excel. Additionally, there are now two Hadoop-based connectors for SQL Server; the Apache Hadoop connector for SQL Server 2008 R2, and the SQL Server Parallel Data Warehouse (PDW) connector. We can connect to Hadoop process the semi-structured data and then store it in SQL Server. For one steeped in the culture of Relational SQL Databases, I might be expected to throw up my hands in the air in a gesture of contempt for a technology that was, judging by the overblown journalism on the subject, about to make my own profession as archaic as the Saggar makers bottom knocker (a potter’s assistant who helped the saggar maker to make the bottom of the saggar by placing clay in a metal hoop and bashing it). However, on the contrary, I find that I'm delighted with the advances made by the NoSQL databases in the past few years. Having the flow of ideas from the NoSQL providers will knock any trace of complacency out of the providers of Relational Databases and inspire them into back-fitting some features, such as horizontal scaling, with sharding and automatic failover into SQL-based RDBMSs. It will do the breed a power of good to benefit from all this lateral thinking.

Read the article
Large, high performance object or key/value store for HTTP serving on Linux

- by Tommy

I have a service that serves images to end users at a very high rate using plain HTTP. The images vary between 4 and 64kbytes, and there are 1.300.000.000 of them in total. The dataset is about 30TiB in size and changes (new objects, updates, deletes) make out less than 1% of the requests. The number of requests pr. second vary from 240 to 9000 and is dispersed pretty much all over, with few objects being especially "hot". As of now, these images are files on a ext3 filesystem distributed read only across a large amount of mid range servers. This poses several problems: Using a fileysystem is very inefficient since the metadata size is large, the inode/dentry cache is volatile on linux and some daemons tend to stat()/readdir() it's way through the directory structure, which in my case becomes very expensive. Updating the dataset is very time consuming and requires remounting between set A and B. The only reasonable handling is operating on the block device for backup, copying, etc. What I would like is a deamon that: speaks HTTP (get, put, delete and perhaps update) stores data it in an efficient structure. The index should remain in memory, and considering the amount of objects, the overhead must be small. The software should be able to handle massive connections with slow (if any) time needed to ramp up. Index should be read in memory at startup. Statistics would be nice, but not mandatory. I have experimented a bit with riak, redis, mongodb, kyoto and varnish with persistent storage, but I haven't had the chance to dig in really deep yet.

Read the article
What scalability problems have you solved using a NoSQL data store?

- by knorv

NoSQL refers to non-relational data stores that break with the history of relational databases and ACID guarantees. Popular open source NoSQL data stores include: Cassandra (tabular, written in Java, used by Facebook, Twitter, Digg, Rackspace, Mahalo and Reddit) CouchDB (document, written in Erlang, used by Engine Yard and BBC) Dynomite (key-value, written in C++, used by Powerset) HBase (key-value, written in Java, used by Bing) Hypertable (tabular, written in C++, used by Baidu) Kai (key-value, written in Erlang) MemcacheDB (key-value, written in C, used by Reddit) MongoDB (document, written in C++, used by Sourceforge, Github, Electronic Arts and NY Times) Neo4j (graph, written in Java, used by Swedish Universities) Project Voldemort (key-value, written in Java, used by LinkedIn) Redis (key-value, written in C, used by Engine Yard, Github and Craigslist) Riak (key-value, written in Erlang, used by Comcast and Mochi Media) Ringo (key-value, written in Erlang, used by Nokia) Scalaris (key-value, written in Erlang, used by OnScale) ThruDB (document, written in C++, used by JunkDepot.com) Tokyo Cabinet/Tokyo Tyrant (key-value, written in C, used by Mixi.jp (Japanese social networking site)) I'd like to know about specific problems you - the SO reader - have solved using data stores and what NoSQL data store you used. Questions: What scalability problems have you used NoSQL data stores to solve? What NoSQL data store did you use? What database did you use before switching to a NoSQL data store? I'm looking for first-hand experiences, so please do not answer unless you have that.

Read the article

Search Results

Search found 28 results on 2 pages for 'riak'.

Page 1/2 | 1 2 | Next Page >

- by Wolfiem

- by Jonas

- by WojonsTech

- by Zubair

- by Zubair

- by pixelcort

- by tesmar

- by Zubair

- by tesmar

- by tesmar

- by Fedrick

- by Julien Genestoux

- by Mickey Shine

- by Gregory Burd

- by user1640228

- by Pinal Dave

- by Mike Trpcic

- by Zubair

- by Zubair

- by sinned

- by Pieter Breed

- by Pinal Dave

- by Phil Factor

- by Tommy

- by knorv

1 2 | Next Page >