Search Results

Search found 43 results on 2 pages for 'bigtable'.

Page 2/2 | < Previous Page | 1 2 

  • Is there any technical info about the youtube network?

    - by Allen
    I'm an IT graduate student trying to get my head around distributed content distribution systems, much like what I assume Youtube uses. I have read Google Research stuff like Bigtable and Google File System academic papers. Is there any such thing for Youtube? Can anyone point me at stuff to learn about the Youtube network and the underlying technology? thanks dbaman

    Read the article

  • How should calculations be handled in a document database

    - by Morten
    Ok, so I have a program that basically logs errors into a nosql database. Right now there is just a single model for an error and its stored as a document in the nosql database. Basically I want to summarize across different errors and produce a summary of the "types" of errors that occured. Traditionally in a SQL database the this normalization would work with groupings, sums and averages but in a NoSQL database I assume I need to use mapreduce. My current model seems unfit for the task, how should I change the way I store "models" in order to make statistical analysis easy? Would a NoSQL database even be the right tool for this type of problem? I'm storing things in Google AppEngine's BigTable, so there are some limitations to think of as well.

    Read the article

  • Interview question: How would you implement Google Search?

    - by ripper234
    Supposed you were asked in an interview "How would you implement Google Search?" How would you answer such a question? There might be resources out there that explain how some pieces in Google are implemented (BigTable, MapReduce, PageRank, ...), but that doesn't exactly fit in an interview. What overall architecture would you use, and how would you explain this in a 15-30 minute time span? I would start with explaining how to build a search engine that handles ~ 100k documents, then expand this via sharding to around 50M docs, then perhaps another architectural/technical leap. This is the 20,000 feet view. What I'd like is the details - how you would actually answer that in an interview. Which data structures would you use. What services/machines is your architecture composed of. What would a typical query latency be? What about failover / split brain issues? Etc...

    Read the article

  • storage service with google app engine.

    - by Bunny Rabbit
    i've been assigned a project to create a filestorage service on google applet engine.but i am really doubtful wheather it's possible or not given the 30 sec limit to process the response and moreover its the bigtable is just a database system not a storage server .am i correct ?

    Read the article

  • How to think in data stores instead of databases?

    - by Jim
    As an example, Google App Engine uses data stores, not a database, to store data. Does anybody have any tips for using data stores instead of databases? It seems I've trained my mind to think 100% in object relationships that map directly to table structures, and now it's hard to see anything differently. I can understand some of the benefits of data stores (e.g. performance and the ability to distribute data), but some good database functionality is sacrificed (e.g. joins). Does anybody who has worked with data stores like BigTable have any good advice to working with them?

    Read the article

  • Why is only the suffix of work_index hashed?

    - by Jaroslav Záruba
    I'm reading through the PDF that Brett Slatkin has published for Google I/O 2010: "Data pipelines with Google App Engine": http://tinyurl.com/3523mej In the video (the Fan-in part) Brett says that the work_index has to be a hash, so that 'you distribute the load across the BigTable': http://www.youtube.com/watch?v=zSDC_TU7rtc#t=48m44 ...and this is how work_index is created: work_index = '%s-%d' % (sum_name, knuth_hash(index)) ...which I guess creates something like 'mySum-54657651321987' I do understand the basic idea, but is why only one half of work_index is hashed? Is it important to hash only part of it leaving the suffix out? Would it be wrong to do md5('%s-%d' % (sum_name, index)) so that the hash would be like '6gw8....hq6' ? I'm Java guy so I would use md5 to hash, which means I get id like 'mySum' + 32 characters. (Obviously I want my ids/keys to be as short as possible here.) If I could hash the whole string my id would be just 32 chars. Or would you suggest to use something else to do the hashing with?

    Read the article

  • Best data store for billions of rows

    - by Jody Powlette
    I need to be able to store small bits of data (approximately 50-75 bytes) for billions of records (~3 billion/month for a year). The only requirement is fast inserts and fast lookups for all records with the same GUID and the ability to access the data store from .net. I'm a SQL server guy and I think SQL Server can do this, but with all the talk about BigTable, CouchDB, and other nosql solutions, it's sounding more and more like an alternative to a traditional RDBS may be best due to optimizations for distributed queries and scaling. I tried cassandra and the .net libraries don't currently compile or are all subject to change (along with cassandra itself). I've looked into many nosql data stores available, but can't find one that meets my needs as a robust production-ready platform. If you had to store 36 billion small, flat records so that they're accessible from .net, what would choose and why?

    Read the article

  • How the existing data to be if entity structure modified or deleted on GAE?

    - by Eonil
    GAE recommends using JDO/JPA. But I have serious question about using OODB like them. JDO based on user's class structure. And data structure should be modified continually as service advances. So, If data(entity) class property being removed, what happened to existing data on the property? If data(entity) class renamed for refactoring reason, how the JDO know those renaming? Or all data loss? Major point is "How JDO/GAE/BigTable applies modification of class into existing entity structure and data?".

    Read the article

  • Is it possible to use Google Docs Viewer to view files already in Google Docs?

    - by john2x
    The title is a little confusing. I'll elaborate. As far as I can tell, the Google Docs Viewer tool accepts a link to a raw document file (e.g. .doc, .pdf, et. al.), and renders its contents in the browser. For example, this url to a pdf http://research.google.com/archive/bigtable-osdi06.pdf when passed to Viewer, returns this link: http://docs.google.com/viewer?url=http%3A%2F%2Fresearch.google.com%2Farchive%2Fbigtable-osdi06.pdf What I'm trying to achieve is, use the Viewer to view a document already hosted in Google Docs (i.e. no longer a raw document file). When passing a link to a Google Docs document to the Viewer, the result is not as expected. It renders the link's HTML source instead of the document's contents. The reason I want to do this is that I want to be able to use the "embed" feature of Viewer to view Google Docs documents. Does Google Docs have a "link to embeddable view" feature? P.S. Here is a sample snippet to an embedded document. This is what I want, but pointing to an existing Google Docs document. <iframe src="http://docs.google.com/viewer?url=http%3A%2F%2Fresearch.google.com%2Farchive%2Fbigtable-osdi06.pdf&embedded=true" width="600" height="780" style="border: none;"></iframe>

    Read the article

  • Server Hosting + AWS

    - by ledy
    Since my dedicated servers are hosted at a "normal" hosting service, I wonder if there is a really cheap way to extend the server farm with AWS instances. E.g. it seems to be a effient and flexible solution with data storage and ressources for ocassional data processing, too. However, it might be very in-efficient to mix two data centres and transfering data from current webhoster to amazon and vice-versa. In my case, the traffic for this continuous data exchange seems to be expensive and the delay for moving the data back to the hoster leads into a lack or delay. How are best practises for mixing non-aws and aws systems? E.g.: How to move the hosters data to aws as log file storage to run urchin analysis and/or port the log file data into a bigtable for exhausting analysis there. After working with the data: how to bring it back to the hoster and use the data with the webservers there? I am not going to move all the server farm to amazon, only "separate" parts or tasks if the transfer/exchange does not lead to increased cost.

    Read the article

  • "cloud architecture" concepts in a system architecture diagrams

    - by markus
    If you design a distributed application for easy scale-out, or you just want to make use of any of the new “cloud computing” offerings by Amazon, Google or Microsoft, there are some typical concepts or components you usually end up using: distributed blob storage (aka S3) asynchronous, durable message queues (aka SQS) non-Relational-/non-transactional databases (like SimpleDB, Google BigTable, Azure SQL Services) distributed background worker pool load-balanced, edge-service processes handling user requests (often virtualized) distributed caches (like memcached) CDN (content delivery network like Akamai) Now when it comes to design and sketch an architecture that makes use of such patterns, are there any commonly used symbols I could use? Or even a download with some cool Visio stencils? :) It doesn’t have to be a formal system like UML but I think it would be great if there were symbols that everyone knows and understands, like we have commonly used shapes for databases or a documents, for example. I think it would be important to not mix it up with traditional concepts like a normal file system (local or network server/SAN), or a relational database. Simply speaking, I want to be able to draw some conclusions about an application’s scalability or data consistency issues by just looking at the system architecture overview diagram. Update: Thank you very much for your answers. I like the idea of putting a small "cloud symbol" on the traditional symbols. However I leave this thread open just in case someone will find specific symbols (maybe in a book or so) - or uploaded some pimped up Visio stencils ;)

    Read the article

  • Pros & Cons of Google App Engine

    - by Rishi
    Pros & Cons of Google App Engine [An Updated List 21st Aug 09] Help me Compile a List of all the Advantages & Disadvantages of Building an Application on the Google App Engine Pros: 1) No Need to buy Servers or Server Space (no maintenance). 2) Makes solving the problem of scaling much easier. Cons: 1) Locked into Google App Engine ?? 2)Developers have read-only access to the filesystem on App Engine. 3)App Engine can only execute code called from an HTTP request (except for scheduled background tasks). 4)Users may upload arbitrary Python modules, but only if they are pure-Python; C and Pyrex modules are not supported. 5)App Engine limits the maximum rows returned from an entity get to 1000 rows per Datastore call. 6)Java applications may only use a subset (The JRE Class White List) of the classes from the JRE standard edition. 7)Java applications cannot create new threads. Known Issues!! http://code.google.com/p/googleappengine/issues/list Hard limits Apps per developer - 10 Time per request - 30 sec Files per app - 3,000 HTTP response size - 10 MB Datastore item size - 1 MB Application code size - 150 MB Pro or Con? App Engine's infrastructure removes many of the system administration and development challenges of building applications to scale to millions of hits. Google handles deploying code to a cluster, monitoring, failover, and launching application instances as necessary. While other services let users install and configure nearly any *NIX compatible software, App Engine requires developers to use Python or Java as the programming language and a limited set of APIs. Current APIs allow storing and retrieving data from a BigTable non-relational database; making HTTP requests; sending e-mail; manipulating images; and caching. Most existing Web applications can't run on App Engine without modification, because they require a relational database.

    Read the article

  • Practical size limitations for RDBMS

    - by grenade
    I am working on a project that must store very large datasets and associated reference data. I have never come across a project that required tables quite this large. I have proved that at least one development environment cannot cope at the database tier with the processing required by the complex queries against views that the application layer generates (views with multiple inner and outer joins, grouping, summing and averaging against tables with 90 million rows). The RDBMS that I have tested against is DB2 on AIX. The dev environment that failed was loaded with 1/20th of the volume that will be processed in production. I am assured that the production hardware is superior to the dev and staging hardware but I just don't believe that it will cope with the sheer volume of data and complexity of queries. Before the dev environment failed, it was taking in excess of 5 minutes to return a small dataset (several hundred rows) that was produced by a complex query (many joins, lots of grouping, summing and averaging) against the large tables. My gut feeling is that the db architecture must change so that the aggregations currently provided by the views are performed as part of an off-peak batch process. Now for my question. I am assured by people who claim to have experience of this sort of thing (which I do not) that my fears are unfounded. Are they? Can a modern RDBMS (SQL Server 2008, Oracle, DB2) cope with the volume and complexity I have described (given an appropriate amount of hardware) or are we in the realm of technologies like Google's BigTable? I'm hoping for answers from folks who have actually had to work with this sort of volume at a non-theoretical level.

    Read the article

  • Why use Django on Google App Engine?

    - by Travis Bradshaw
    When researching Google App Engine (GAE), it's clear that using Django is wildly popular for developing in Python on GAE. I've been scouring the web to find information on the costs and benefits of using Django, to find out why it's so popular. While I've been able to find a wide variety of sources on how to run Django on GAE and the various methods of doing so, I haven't found any comparative analysis on why Django is preferable to using the webapp framework provided by Google. To be clear, it's immediately apparent why using Django on GAE is useful for developers with an existing skillset in Django (a majority of Python web developers, no doubt) or existing code in Django (where using GAE is more of a porting exercise). My team, however, is evaluating GAE for use on an all-new project and our existing experience is with TurboGears, not Django. It's been quite difficult to determine why Django is beneficial to a development team when the BigTable libraries have replaced Django's ORM, sessions and authentication are necessarily changed, and Django's templating (if desirable) is available without using the entire Django stack. Finally, it's clear that using Django does have the advantage of providing an "exit strategy" if we later wanted to move away from GAE and need a platform to target for the exodus. I'd be extremely appreciative for help in pointing out why using Django is better than using webapp on GAE. I'm also completely inexperienced with Django, so elaboration on smaller features and/or conveniences that work on GAE are also valuable to me. Thanks in advance for your time!

    Read the article

  • CakePHP sending multiple lines of same cookie information in response header. why??

    - by Vicer
    Hi all, I am having this problem with one of my cakePHP applications. My application displays a big html table on the page. I noticed that when the table goes beyond a certain size limit, IE cannot display that page. While trying to figure out why this happens, I noticed that my html response header contains a HUGE number of lines repeating the same thing. Response Headers ------------------------------ Date Thu, 20 May 2010 04:18:10 GMT Server Apache/2.0.63 (Win32) mod_ssl/2.0.63 OpenSSL/0.9.7m PHP/5.2.10 X-Powered-By PHP/5.2.10 P3P CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM" Set-Cookie CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:12 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:12 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:12 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:12 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:12 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:12 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:12 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:12 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:12 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:12 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:12 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:12 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1; expires=Sun, 20-May-2035 10:18:13 GMT; path=/mywebapp Keep-Alive timeout=15, max=100 Connection Keep-Alive Transfer-Encoding chunked Content-Type text/html ========================================================================== Request Headers -------------------------- Host localhost:8080 User-Agent Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 ( .NET CLR 3.5.30729) Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language en-us,en;q=0.5 Accept-Encoding gzip,deflate Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive 115 Connection keep-alive Referer http://localhost:8080/mywebapp/section2/bigtable/1 Cookie CAKEPHP=q4tp37tn9gkhcpgmf1ftr1i6c1 Authorization Basic cGt1bWFyYTpwa3VtYXJh Notice the huge set of repeated lines at 'Set-Cookie' in the Response Header. This only happens when I try to display large tables. Does anyone have any clue to what might be causing this? Any help to find the issue is appreciated. I am using CakePHP 1.2.5. As far as I know, I am not messing with any set cookie functions. But yes I am using session variables.

    Read the article

< Previous Page | 1 2