Search Results

Search found 43 results on 2 pages for 'bigtable'.

Page 1/2 | 1 2  | Next Page >

  • Designing persistence schema for BigTable on AppEngine

    - by Vitalij Zadneprovskij
    I have tried to design the datastore schema for a very small application. That schema would have been very simple, if not trivial, using a relational database with foreign keys, many-to-many relations, joins, etc. But the problem was that my application was targeted for Google App Engine and I had to design for a database that was not relational. At the end I gave up. Is there a book or an article that describes design principles for applications that are meant for such databases? The books that I have found are about programming for App Engine and they don't spend many words about database design principles.

    Read the article

  • Difference between Document-oriented-DB and Bigtable clones

    - by chen
    We are looking for a suitable storage engine for our weblog history data. We looked at Bigtable's paper and understand it is suitable to us well. However, I also understand that Document-oriented-DB such as MongoDB seems to provide a little more powerful schema power -- i.e, it can model our data as well. I wonder how nowadays ppl choose a scalable NoSQL DB --- I read enough articles like "we looked at A, B and C, and we decided to use C". But I'd like to see some benchmark number. What I am saying is that if MongoDB and the like can provide same level of performance as Bigtable clones, why don't web companies choose it (preparing to deal with various potentially more complex data problem)? Thanks, By the way, I read an article (which convinced me at the moment) saying Cassandra does not fit the M/R operation, any comments?

    Read the article

  • Scalability comparison between different DBMSs

    - by Björn Lindfors
    By what factor does the performance (read queries/sec) increase when a machine is added to a cluster of machines running either: a Bigtable-like database MySQL? Google's research paper on Bigtable suggests that "near-linear" scaling is achieved can be achieved with Bigtable. This page here featuring MySQL's marketing jargon suggests that MySQL is capable of scaling linearly. Where is the truth?

    Read the article

  • Hidden limitations of Google App Engine?

    - by Kyle Cronin
    I've been looking into writing a web app that will run on Google App Engine, but before I commit myself to the platform I'd like to know what, if any, limitations there are. I'm aware of the basic CPU/bandwidth restrictions that Google places on the free service, but I'm wondering more about development restrictions like how BigTable compares to a standard relational database and what Python libraries aren't available on the GAE platform (and what alternatives Google provides). Basically I'm looking for any hidden roadblocks before I commit to the platform. Thanks for your help!

    Read the article

  • Alternative databases to use when putting IIS Logs into a database using LogParser

    - by Robin Day
    We have run some scripts that use LogParser to dump our IIS logs into a SQL Server database. We can then query this to get simple stats on hits, usage etc. It's also good when linking it to error log databases and performance counter database to compare usage with errors, etc. Having implemented this for just one system and for the last 2-3 weeks we already have a 5GB database with around 10 million records. This is making any queries to this database quite slow and will no doubt cause storage issues if we continue to log as we are. Can anyone suggest any alternative databases that we could use for this data that would be more efficient for such logs? I'd be particularly interested in any experience of Google's BigTable or Amazon's SimbleDB. Are either of these suitable for reporting queries? COUNTs, GROUP BYs, PIVOTs?

    Read the article

  • mysql stored routine vs. mysql-alternative?

    - by user522962
    We are using a mysql database w/ about 150,000 records (names) total. Our searches on the 'names' field is done through an autocomplete function in php. We have the table indexed but still feel that the searching is a bit sluggish (a few full seconds vs. something like Google Finance w/ near-instant response). We came up w/ 2 possibilities, but wanted to get more insight: Can we create a bunch (many thousands or more) of stored procedures to speed up searches, or will creating that many stored procedures bog-down the db? Is there a faster alternative to mysql for "select" statements (speed on inserting & updating rows isn't too important so we can sacrifice that, if necessary). I've vaguely heard of BigTable & others that don't support JOIN statements....we need JOIN statements for some of our other queries we do. thx

    Read the article

  • Google App-Engine Java Batch Update

    - by Manjoor
    I need to upload a .csv file and save the records in bigtable. My application successfully parse 200 the records in the csv files and save to table. Here is my code to save the data. for (int i=0;i<lines.length -1;i++) //lines hold total records in csv file { String line = lines[i]; //The record have 3 columns integer,integer,Text if(line.length() > 15) { int n = line.indexOf(","); if (n>0) { int ID = lInteger.parseInt(ine.substring(0,n)); int n1 = line.indexOf(",", n + 2); if(n1 > n) { int Col1 = Integer.parseInt(line.substring(n + 1, n1)); String Col2 = line.substring(n1 + 1); myTable uu = new myTable(); uu.setId(ID); uu.setCol1(MobNo); Text t = new Text(Col2); uu.setCol2(t); PersistenceManager pm = PMF.get().getPersistenceManager(); pm.makePersistent(uu); pm.close(); } } } } But when no of records grow it gives timeout error. The csv file may have upto 800 records. Is it possible to do that in App-Engine? (something like batch update)

    Read the article

  • mysql query performance help

    - by Stefano
    Hi I have a quite large table storing words contained in email messages mysql> explain t_message_words; +----------------+---------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------------+---------+------+-----+---------+----------------+ | mwr_key | int(11) | NO | PRI | NULL | auto_increment | | mwr_message_id | int(11) | NO | MUL | NULL | | | mwr_word_id | int(11) | NO | MUL | NULL | | | mwr_count | int(11) | NO | | 0 | | +----------------+---------+------+-----+---------+----------------+ table contains about 100M rows mwr_message_id is a FK to messages table mwr_word_id is a FK to words table mwr_count is the number of occurrencies of word mwr_word_id in message mwr_message_id To calculate most used words, I use the following query SELECT SUM(mwr_count) AS word_count, mwr_word_id FROM t_message_words GROUP BY mwr_word_id ORDER BY word_count DESC LIMIT 100; that runs almost forever (more than half an hour on the test server) mysql> show processlist; +----+------+----------------+--------+---------+------+----------------------+----------------------------------------------------- | Id | User | Host | db | Command | Time | State | Info +----+------+----------------+--------+---------+------+----------------------+----------------------------------------------------- processlist | 41 | root | localhost:3148 | tst_db | Query | 1955 | Copying to tmp table | SELECT SUM(mwr_count) AS word_count, mwr_word_id FROM t_message_words GROUP BY mwr_word_id | +----+------+----------------+--------+---------+------+----------------------+----------------------------------------------------- 3 rows in set (0.00 sec) Is there anything I can do to "speed up" the query (apart from adding more ram, more cpu, faster disks)? thank you in advance stefano

    Read the article

  • Hadoop Map/Reduce - simple use example to do the following...

    - by alexeypro
    I have MySQL database, where I store the following BLOB (which contains JSON object) and ID (for this JSON object). JSON object contains a lot of different information. Say, "city:Los Angeles" and "state:California". There are about 500k of such records for now, but they are growing. And each JSON object is quite big. My goal is to do searches (real-time) in MySQL database. Say, I want to search for all JSON objects which have "state" to "California" and "city" to "San Francisco". I want to utilize Hadoop for the task. My idea is that there will be "job", which takes chunks of, say, 100 records (rows) from MySQL, verifies them according to the given search criteria, returns those (ID's) which qualify. Pros/cons? I understand that one might think that I should utilize simple SQL power for that, but the thing is that JSON object structure is pretty "heavy", if I put it as SQL schemas, there will be at least 3-5 tables joins, which (I tried, really) creates quite a headache, and building all the right indexes eats RAM faster than I one can think. ;-) And even then, every SQL query has to be analyzed to be utilizing the indexes, otherwise with full scan it literally is a pain. And with such structure we have the only way "up" is just with vertical scaling. But I am not sure it's the best option for me, as I see how JSON objects will grow (the data structure), and I see that the number of them will grow too. :-) Help? Can somebody point me to simple examples of how this can be done? Does it make sense at all? Am I missing something important? Thank you.

    Read the article

  • Database that consumes less disk space

    - by Hugo Palma
    I'm looking at solutions to store a massive quantity of information consuming the less possible disk space. The information structure is very simple and the queries will also be very simple. I've looked at solutions like Apache Cassandra and relations databases but couldn't find a comparison where disk usage is mentioned. Any ideas on this would be great.

    Read the article

  • App engine downtime

    - by DutrowLLC
    I've noticed that google app engine seems to have a fair amount of downtime where they place the datastore into read-only mode. Frequently this downtime is in the middle of the day. Is this something that is happening only during early development, or is this something that I can expect to be always be occurring? I've developing an application that helps small businesses handle their operations. One thing that it does is take appointments, another is route phone calls. I'd like some suggestions on how to handle times when the datastore is in read-only such as: What if our client is on the phone with the customer and is taking down an appointment and the datastore is in read-only? It would not be acceptable to ask the client to come back later to save, especially if its in the middle of the day. What if there is an incoming call and the application can not store the record or properly route the call due to database writes being unavailable? How are these types of issues normally handled?

    Read the article

  • how to model a follower stream in appengine?

    - by molicule
    I am trying to design tables to buildout a follower relationship. Say I have a stream of 140char records that have user, hashtag and other text. Users follow other users, and can also follow hashtags. I am outlining the way I've designed this below, but there are two limitaions in my design. I was wondering if others had smarter ways to accomplish the same goal. The issues with this are The list of followers is copied in for each record If a new follower is added or one removed, 'all' the records have to be updated. The code class HashtagFollowers(db.Model): """ This table contains the followers for each hashtag """ hashtag = db.StringProperty() followers = db.StringListProperty() class UserFollowers(db.Model): """ This table contains the followers for each user """ username = db.StringProperty() followers = db.StringListProperty() class stream(db.Model): """ This table contains the data stream """ username = db.StringProperty() hashtag = db.StringProperty() text = db.TextProperty() def save(self): """ On each save all the followers for each hashtag and user are added into a another table with this record as the parent """ super(stream, self).save() hfs = HashtagFollowers.all().filter("hashtag =", self.hashtag).fetch(10) for hf in hfs: sh = streamHashtags(parent=self, followers=hf.followers) sh.save() ufs = UserFollowers.all().filter("username =", self.username).fetch(10) for uf in ufs: uh = streamUsers(parent=self, followers=uf.followers) uh.save() class streamHashtags(db.Model): """ The stream record is the parent of this record """ followers = db.StringListProperty() class streamUsers(db.Model): """ The stream record is the parent of this record """ followers = db.StringListProperty() Now, to get the stream of followed hastags indexes = db.GqlQuery("""SELECT __key__ from streamHashtags where followers = 'myusername'""") keys = [k,parent() for k in indexes[offset:numresults]] return db.get(keys) Is there a smarter way to do this?

    Read the article

  • What is the best way to create a running integer id on the AppEngine data storage?

    - by Freed
    For various reasons, I need a unique running integer id for my entities stored on the Google AppEngine. The automatically generated key sort of has this behaviour, but it doesn't start from 1 (or 0) and doesn't guarantee that the generated integer part will come from a continuous sequence. What would be the best way to efficiently implement this on AppEngine? Is there any support from the storage system? To add to the complexity, I might need to do this over entities from different entity groups, meaning I can't just get the highest id right now and save an entity with the next id in a transaction. Might memcache be the way to go..? Edit: I havn't yet implemented this, but to clarify on the memcache idea. I know memcache is unreliable, but in practice it probably won't lose data "too often" to hurt performance. Basically, I would have a memcache entry for the last used id, update it (somehow atomically) whenever I create a new entity and use that id. In the case of memcache not having a value for this entry, I'd get the highest id so far by doing a query on my entities sorted by the id and update memcache (unless someone else had already done so). The only problem I can see with this right now would be atomicity of the operation as a whole if the save of my new entity was also part of a transaction. Thoughts..?

    Read the article

  • Is it possible to return a list of numbers from a Sybase function?

    - by ps_rs4
    I'm trying to overcome a very serious performance issue in which Sybase refuses to use the primary key index on a large table because one of the required fields is specified indirectly through another table - or, in other words; SELECT ... FROM BIGTABLE WHERE KFIELD = 123 runs in ms but SELECT ... FROM BIGTABLE, LTLTBL WHERE KFIELD = LTLTBL.LOOKUP AND LTLTBL.UNIQUEID = 'STRINGREPOF123' takes 30 - 40 seconds. I've managed to work around this first problem by using a function that basically lets me do this; SELECT ... FROM BIGTABLE WHERE KFIELD = MYFUNC('STRINGREPOF123') which also runs in ms. The problem, however, is that this approach only works when there is a single value returned by MYFUNCT but I have some cases where it may return 2 or 3 values. I know that the SQL SELECT ... FROM BIGTABLE WHERE KFIELD IN (123,456,789) also returns in millis so I'd like to have a function that returns a list of possible values rather than just a single one - is this possible? Sadly the application is running on Sybase ASA 9. Yes I know it is old and is scheduled to be refreshed but there's nothing I can do about it now so I need logic that will work with this version of the DB. Thanks in advance for any assistance on this matter.

    Read the article

  • SQL JOIN with two or more tables as output - most efficient way?

    - by littlegreen
    I have an SQL query that executes a LEFT JOIN on another table, then outputs all results that could be coupled into a designated table. I then have a second SQL query that executes the LEFT JOIN again, then outputs the results that could not be coupled to a designated table. In code, this is something like: INSERT INTO coupledrecords SELECT b.col1, b.col2... s.col1, s.col2... FROM bigtable AS b LEFT JOIN smallertable AS s ON criterium WHERE s.col1 IS NOT NULL INSERT INTO notcoupledrecords SELECT b.col1, b.col2... bigtable AS b LEFT JOIN smallertable AS s ON criterium WHERE s.col1 IS NULL My question: I now have to execute the JOIN two times, in order to achieve what I want. I have a feeling that this is twice as slow as it could be. Is this true, and if yes, is there a way to do it more efficiently?

    Read the article

  • Faster Subversion Hosting

    When we launched our first Subversion-on- Bigtable service in 2006 our goal was to scale to support hundreds of thousands of projects, with the idea that we could...

    Read the article

1 2  | Next Page >