Search Results

Search found 1908 results on 77 pages for 'relational operators'.

Page 38/77 | < Previous Page | 34 35 36 37 38 39 40 41 42 43 44 45  | Next Page >

  • Big Data – Data Mining with Hive – What is Hive? – What is HiveQL (HQL)? – Day 15 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the operational database in Big Data Story. In this article we will understand what is Hive and HQL in Big Data Story. Yahoo started working on PIG (we will understand that in the next blog post) for their application deployment on Hadoop. The goal of Yahoo to manage their unstructured data. Similarly Facebook started deploying their warehouse solutions on Hadoop which has resulted in HIVE. The reason for going with HIVE is because the traditional warehousing solutions are getting very expensive. What is HIVE? Hive is a datawarehouseing infrastructure for Hadoop. The primary responsibility is to provide data summarization, query and analysis. It  supports analysis of large datasets stored in Hadoop’s HDFS as well as on the Amazon S3 filesystem. The best part of HIVE is that it supports SQL-Like access to structured data which is known as HiveQL (or HQL) as well as big data analysis with the help of MapReduce. Hive is not built to get a quick response to queries but it it is built for data mining applications. Data mining applications can take from several minutes to several hours to analysis the data and HIVE is primarily used there. HIVE Organization The data are organized in three different formats in HIVE. Tables: They are very similar to RDBMS tables and contains rows and tables. Hive is just layered over the Hadoop File System (HDFS), hence tables are directly mapped to directories of the filesystems. It also supports tables stored in other native file systems. Partitions: Hive tables can have more than one partition. They are mapped to subdirectories and file systems as well. Buckets: In Hive data may be divided into buckets. Buckets are stored as files in partition in the underlying file system. Hive also has metastore which stores all the metadata. It is a relational database containing various information related to Hive Schema (column types, owners, key-value data, statistics etc.). We can use MySQL database over here. What is HiveSQL (HQL)? Hive query language provides the basic SQL like operations. Here are few of the tasks which HQL can do easily. Create and manage tables and partitions Support various Relational, Arithmetic and Logical Operators Evaluate functions Download the contents of a table to a local directory or result of queries to HDFS directory Here is the example of the HQL Query: SELECT upper(name), salesprice FROM sales; SELECT category, count(1) FROM products GROUP BY category; When you look at the above query, you can see they are very similar to SQL like queries. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Pig. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Big Data – Operational Databases Supporting Big Data – Key-Value Pair Databases and Document Databases – Day 13 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Relational Database and NoSQL database in the Big Data Story. In this article we will understand the role of Key-Value Pair Databases and Document Databases Supporting Big Data Story. Now we will see a few of the examples of the operational databases. Relational Databases (Yesterday’s post) NoSQL Databases (Yesterday’s post) Key-Value Pair Databases (This post) Document Databases (This post) Columnar Databases (Tomorrow’s post) Graph Databases (Tomorrow’s post) Spatial Databases (Tomorrow’s post) Key Value Pair Databases Key Value Pair Databases are also known as KVP databases. A key is a field name and attribute, an identifier. The content of that field is its value, the data that is being identified and stored. They have a very simple implementation of NoSQL database concepts. They do not have schema hence they are very flexible as well as scalable. The disadvantages of Key Value Pair (KVP) database are that they do not follow ACID (Atomicity, Consistency, Isolation, Durability) properties. Additionally, it will require data architects to plan for data placement, replication as well as high availability. In KVP databases the data is stored as strings. Here is a simple example of how Key Value Database will look like: Key Value Name Pinal Dave Color Blue Twitter @pinaldave Name Nupur Dave Movie The Hero As the number of users grow in Key Value Pair databases it starts getting difficult to manage the entire database. As there is no specific schema or rules associated with the database, there are chances that database grows exponentially as well. It is very crucial to select the right Key Value Pair Database which offers an additional set of tools to manage the data and provides finer control over various business aspects of the same. Riak Rick is one of the most popular Key Value Database. It is known for its scalability and performance in high volume and velocity database. Additionally, it implements a mechanism for collection key and values which further helps to build manageable system. We will further discuss Riak in future blog posts. Key Value Databases are a good choice for social media, communities, caching layers for connecting other databases. In simpler words, whenever we required flexibility of the data storage keeping scalability in mind – KVP databases are good options to consider. Document Database There are two different kinds of document databases. 1) Full document Content (web pages, word docs etc) and 2) Storing Document Components for storage. The second types of the document database we are talking about over here. They use Javascript Object Notation (JSON) and Binary JSON for the structure of the documents. JSON is very easy to understand language and it is very easy to write for applications. There are two major structures of JSON used for Document Database – 1) Name Value Pairs and 2) Ordered List. MongoDB and CouchDB are two of the most popular Open Source NonRelational Document Database. MongoDB MongoDB databases are called collections. Each collection is build of documents and each document is composed of fields. MongoDB collections can be indexed for optimal performance. MongoDB ecosystem is highly available, supports query services as well as MapReduce. It is often used in high volume content management system. CouchDB CouchDB databases are composed of documents which consists fields and attachments (known as description). It supports ACID properties. The main attraction points of CouchDB are that it will continue to operate even though network connectivity is sketchy. Due to this nature CouchDB prefers local data storage. Document Database is a good choice of the database when users have to generate dynamic reports from elements which are changing very frequently. A good example of document usages is in real time analytics in social networking or content management system. Tomorrow In tomorrow’s blog post we will discuss about various other Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Top 10 Linked Blogs of 2010

    - by Bill Graziano
    Each week I send out a SQL Server newsletter and include links to interesting blog posts.  I’ve linked to over 500 blog posts so far in 2010.  Late last year I started storing those links in a database so I could do a little reporting.  I tend to link to posts related to the OLTP engine.  I also try to link to the individual blogger in the group blogs.  Unfortunately that wasn’t possible for the SQLCAT and CSS blogs.  I also have a real weakness for posts related to PASS. These are the top 10 blogs that I linked to during the year ordered by the number of posts I linked to. Paul Randal – Paul writes extensively on the internals of the relational engine.  Lots of great posts around transactions, transaction log, disaster recovery, corruption, indexes and DBCC.  I also linked to many of his SQL Server myths posts. Glenn Berry – Glenn writes very interesting posts on how hardware affects SQL Server.  I especially like his posts on the various CPU platforms.  These aren’t necessarily topics that I’m searching for but I really enjoy reading them. The SQLCAT Team – This Microsoft team focuses on the largest and most interesting SQL Server installations.  The regularly publish white papers and best practices. SQL Server CSS Team – These are the top engineers from the Microsoft Customer Service and Support group.  These are the folks you finally talk to after your case has been escalated about 20 times.  They write about the interesting problems they find. Brent Ozar – The posts I linked to mostly focused on the relational engine: CPU, NUMA, SSD drives, performance monitoring, etc.  But Brent writes about a real variety of topics including blogging, social networking, speaking, the MCM, SQL Azure and anything else that seems to strike his fancy.  His posts are always well written and though provoking. Jeremiah Peschka – A number of Jeremiah’s posts weren’t about SQL Server.  He’s very active in the “NoSQL” area and I linked to a number of those posts.  I think it’s important for people to know what other technologies are out there. Brad McGehee – Brad writes about being a DBA including maintenance plans, DBA checklists, compression and audit. Thomas LaRock – I linked to a variety of posts from PBM to networking to 24 Hours of PASS to TDE.  Just a real variety of topics.  Tom always writes with an interesting style usually mixing in a movie theme and/or bacon. Aaron Bertrand – Many of my links this year were Denali features.  He also had a great series on bad habits to kick. Michael J. Swart – This last one surprised me.  There are some well known SQL Server bloggers below Michael on this list.  I linked to posts on indexes, hierarchies, transactions and I/O performance and a variety of other engine related posts.  All are interesting and well thought out.  Many of his non-SQL posts are also very good.  He seems to have an interest in puzzles and other brain teasers.  Michael, I won’t be surprised again!

    Read the article

  • SQL SERVER – Introduction to Big Data – Guest Post

    - by pinaldave
    BIG Data – such a big word – everybody talks about this now a days. It is the word in the database world. In one of the conversation I asked my friend Jasjeet Sigh the same question – what is Big Data? He instantly came up with a very effective write-up.  Jasjeet is working as a Technical Manager with Koenig Solutions. He leads the SQL domain, and holds rich IT industry experience. Talking about Koenig, it is a 19 year old IT training company that offers several certification choices. Some of its courses include SharePoint Training, Project Management certifications, Microsoft Trainings, Business Intelligence programs, Web Design and Development courses etc. Big Data, as the name suggests, is about data that is BIG in nature. The data is BIG in terms of size, and it is difficult to manage such enormous data with relational database management systems that are quite popular these days. Big Data is not just about being large in size, it is also about the variety of the data that differs in form or type. Some examples of Big Data are given below : Scientific data related to weather and atmosphere, Genetics etc Data collected by various medical procedures, such as Radiology, CT scan, MRI etc Data related to Global Positioning System Pictures and Videos Radio Frequency Data Data that may vary very rapidly like stock exchange information Apart from difficulties in managing and storing such data, it is difficult to query, analyze and visualize it. The characteristics of Big Data can be defined by four Vs: Volume: It simply means a large volume of data that may span Petabyte, Exabyte and so on. However it also depends organization to organization that what volume of data they consider as Big Data. Variety: As discussed above, Big Data is not limited to relational information or structured Data. It can also include unstructured data like pictures, videos, text, audio etc. Velocity:  Velocity means the speed by which data changes. The higher is the velocity, the more efficient should be the system to capture and analyze the data. Missing any important point may lead to wrong analysis or may even result in loss. Veracity: It has been recently added as the fourth V, and generally means truthfulness or adherence to the truth. In terms of Big Data, it is more of a challenge than a characteristic. It is difficult to ascertain the truth out of the enormous amount of data and the one that has high velocity. There are always chances of having un-precise and uncertain data. It is a challenging task to clean such data before it is analyzed. Big Data can be considered as the next big thing in the IT sector in terms of innovation and development. If appropriate technologies are developed to analyze and use the information, it can be the driving force for almost all industrial segments. These include Retail, Manufacturing, Service, Finance, Healthcare etc. This will help them to automate business decisions, increase productivity, and innovate and develop new products. Thanks Jasjeet Singh for an excellent write up.  Jasjeet Sign is working as a Technical Manager with Koenig Solutions. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Database, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Big Data

    Read the article

  • IPgallery banks on Solaris SPARC

    - by Frederic Pariente
    IPgallery is a global supplier of converged legacy and Next Generation Networks (NGN) products and solutions, including: core network components and cloud-based Value Added Services (VAS) for voice, video and data sessions. IPgallery enables network operators and service providers to offer advanced converged voice, chat, video/content services and rich unified social communications in a combined legacy (fixed/mobile), Over-the-Top (OTT) and Social Community (SC) environments for home and business customers. Technically speaking, this offer is a scalable and robust telco solution enabling operators to offer new services while controlling operating expenses (OPEX). In its solutions, IPgallery leverages the following Oracle components: Oracle Solaris, Netra T4 and SPARC T4 in order to provide a competitive and scalable solution without the price tag often associated with high-end systems. Oracle Solaris Binary Application Guarantee A unique feature of Oracle Solaris is the guaranteed binary compatibility between releases of the Solaris OS. That means, if a binary application runs on Solaris 2.6 or later, it will run on the latest release of Oracle Solaris.  IPgallery developed their application on Solaris 9 and Solaris 10 then runs it on Solaris 11, without any code modification or rebuild. The Solaris Binary Application Guarantee helps IPgallery protect their long-term investment in the development, training and maintenance of their applications. Oracle Solaris Image Packaging System (IPS) IPS is a new repository-based package management system that comes with Oracle Solaris 11. It provides a framework for complete software life-cycle management such as installation, upgrade and removal of software packages. IPgallery leverages this new packaging system in order to speed up and simplify software installation for the R&D and production environments. Notably, they use IPS to deliver Solaris Studio 12.3 packages as part of the rapid installation process of R&D environments, and during the production software deployment phase, they ensure software package integrity using the built-in verification feature. Solaris IPS thus improves IPgallery's time-to-market with a faster, more reliable software installation and deployment in production environments. Extreme Network Performance IPgallery saw a huge improvement in application performance both in CPU and I/O, when running on SPARC T4 architecture in compared to UltraSPARC T2 servers.  The same application (with the same activation environment) running on T2 consumes 40%-50% CPU, while it consumes only 10% of the CPU on T4. The testing environment comprised of: Softswitch (Call management), TappS (Telecom Application Server) and Billing Server running on same machine and initiating various services in capacity of 1000 CAPS (Call Attempts Per Second). In addition, tests showed a huge improvement in the performance of the TCP/IP stack, which reduces network layer processing and in the end Call Attempts latency. Finally, there is a huge improvement within the file system and disk I/O operations; they ran all tests with maximum logging capability and it didn't influence any benchmark values. "Due to the huge improvements in performance and capacity using the T4-1 architecture, IPgallery has engineered the solution with less hardware.  This means instead of deploying the solution on six T2-based machines, we will deploy on 2 redundant machines while utilizing Oracle Solaris Zones and Oracle VM for higher availability and virtualization" Shimon Lichter, VP R&D, IPgallery In conclusion, using the unique combination of Oracle Solaris and SPARC technologies, IPgallery is able to offer solutions with much lower TCO, while providing a higher level of service capacity, scalability and resiliency. This low-OPEX solution enables the operator, the end-customer, to deliver a high quality service while maintaining high profitability.

    Read the article

  • SQL SERVER – Partition Parallelism Support in expressor 3.6

    - by pinaldave
    I am very excited to learn that there is a new version of expressor’s data integration platform coming out in March of this year.  It will be version 3.6, and I look forward to using it and telling everyone about it.  Let me describe a little bit more about what will be so great in expressor 3.6: Greatly enhanced user interface Parallel Processing Bulk Artifact Upgrading The User Interface First let me cover the most obvious enhancements. The expressor Studio user interface (UI) has had some significant work done. Kudos to the expressor Engineering team; the entire UI is a visual masterpiece that is very responsive and intuitive. The improvements are more than just eye candy; they provide significant productivity gains when developing expressor Dataflows. Operator shape icons now include a description that identifies the function of each operator, instead of having to guess at the function by the icon. Operator shapes and highlighting depict the current function and status: Disabled, enabled, complete, incomplete, and error. Each status displays an appropriate message in the message panel with correction suggestions. Floating or docking property panels provide descriptive tool tips for each property as well as auto resize when adjusting the canvas, without having to search Help or the need to scroll around to get access to the property. Progress and status indicators let you know when an operation is working. “No limit” canvas with snap-to-grid allows automatic sizing and accurate positioning when you have numerous operators in the Dataflow. The inline tool bar offers quick access to pan, zoom, fit and overview functions. Selecting multiple artifacts with a right click context allows you to easily manage your workspace more efficiently. Partitioning and Parallel Processing Partitioning allows each operator to process multiple subsets of records in parallel as opposed to processing all records that flow through that operator in a single sequential set. This capability allows the user to configure the expressor Dataflow to run in a way that most efficiently utilizes the resources of the hardware where the Dataflow is running. Partitions can exist in most individual operators. Using partitions increases the speed of an expressor data integration application, therefore improving performance and load times. With the expressor 3.6 Enterprise Edition, expressor simplifies enabling parallel processing by adding intuitive partition settings that are easy to configure. Bulk Artifact Upgrading Bulk Artifact Upgrading sounds a bit intimidating, but it actually is not and it is a welcome addition to expressor Studio. In past releases, users were prompted to confirm that they wanted to upgrade their individual artifacts only when opened. This was a cumbersome and repetitive process. Now with bulk artifact upgrading, a user can easily select what artifact or group of artifacts to upgrade all at once. As you can see, there are many new features and upgrade options that will prove to make expressor Studio quicker and more efficient.  I hope I’m not the only one who is excited about all these new upgrades, and that I you try expressor and share your experience with me. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

    Read the article

  • Big Data – Operational Databases Supporting Big Data – Columnar, Graph and Spatial Database – Day 14 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Key-Value Pair Databases and Document Databases in the Big Data Story. In this article we will understand the role of Columnar, Graph and Spatial Database supporting Big Data Story. Now we will see a few of the examples of the operational databases. Relational Databases (The day before yesterday’s post) NoSQL Databases (The day before yesterday’s post) Key-Value Pair Databases (Yesterday’s post) Document Databases (Yesterday’s post) Columnar Databases (Tomorrow’s post) Graph Databases (Today’s post) Spatial Databases (Today’s post) Columnar Databases  Relational Database is a row store database or a row oriented database. Columnar databases are column oriented or column store databases. As we discussed earlier in Big Data we have different kinds of data and we need to store different kinds of data in the database. When we have columnar database it is very easy to do so as we can just add a new column to the columnar database. HBase is one of the most popular columnar databases. It uses Hadoop file system and MapReduce for its core data storage. However, remember this is not a good solution for every application. This is particularly good for the database where there is high volume incremental data is gathered and processed. Graph Databases For a highly interconnected data it is suitable to use Graph Database. This database has node relationship structure. Nodes and relationships contain a Key Value Pair where data is stored. The major advantage of this database is that it supports faster navigation among various relationships. For example, Facebook uses a graph database to list and demonstrate various relationships between users. Neo4J is one of the most popular open source graph database. One of the major dis-advantage of the Graph Database is that it is not possible to self-reference (self joins in the RDBMS terms) and there might be real world scenarios where this might be required and graph database does not support it. Spatial Databases  We all use Foursquare, Google+ as well Facebook Check-ins for location aware check-ins. All the location aware applications figure out the position of the phone with the help of Global Positioning System (GPS). Think about it, so many different users at different location in the world and checking-in all together. Additionally, the applications now feature reach and users are demanding more and more information from them, for example like movies, coffee shop or places see. They are all running with the help of Spatial Databases. Spatial data are standardize by the Open Geospatial Consortium known as OGC. Spatial data helps answering many interesting questions like “Distance between two locations, area of interesting places etc.” When we think of it, it is very clear that handing spatial data and returning meaningful result is one big task when there are millions of users moving dynamically from one place to another place & requesting various spatial information. PostGIS/OpenGIS suite is very popular spatial database. It runs as a layer implementation on the RDBMS PostgreSQL. This makes it totally unique as it offers best from both the worlds. Courtesy: mushroom network Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Hive. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • SQL SERVER – Last Two Days to Get FREE Book – Joes 2 Pros Certification 70-433

    - by pinaldave
    Earlier this week we announced that we will be giving away FREE SQL Wait Stats book to everybody who will get SQL Server Joes 2 Pros Combo Kit. We had a fantastic response to the contest. We got an overwhelming response to the offer. We knew there would be a great response but we want to honestly say thank you to all of you for making it happen. Rick and I want to make sure that we express our special thanks to all of you who are reading our books. The offer is still on and there are two more days to avail this offer. We want to make sure that everybody who buys our most selling combo kits, we will send our other most popular SQL Wait Stats book. Please read all the details of the offer here. The books are great resources for anyone who wants to learn SQL Server from fundamentals and eventually go on the certification path of 70-433. Exam 70-433 contains following important subject and the book covers the subject of fundamental. If you are taking the exam or not taking the exam – this book is for every SQL Developer to learn the subject from fundamentals.  Create and alter tables. Create and alter views. Create and alter indexes. Create and modify constraints. Implement data types. Implement partitioning solutions. Create and alter stored procedures. Create and alter user-defined functions (UDFs). Create and alter DML triggers. Create and alter DDL triggers. Create and deploy CLR-based objects. Implement error handling. Manage transactions. Query data by using SELECT statements. Modify data by using INSERT, UPDATE, and DELETE statements. Return data by using the OUTPUT clause. Modify data by using MERGE statements. Implement aggregate queries. Combine datasets. INTERSECT, EXCEPT Implement subqueries. Implement CTE (common table expression) queries. Apply ranking functions. Control execution plans. Manage international considerations. Integrate Database Mail. Implement full-text search. Implement scripts by using Windows PowerShell and SQL Server Management Objects (SMOs). Implement Service Broker solutions. Track data changes. Data capture Retrieve relational data as XML. Transform XML data into relational data. Manage XML data. Capture execution plans. Collect output from the Database Engine Tuning Advisor. Collect information from system metadata. Availability of Book USA - Amazon | India - Flipkart | Indiaplaza Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Joes 2 Pros, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • We have our standards, and we need them

    - by Tony Davis
    The presenter suddenly broke off. He was midway through his section on how to apply to the relational database the Continuous Delivery techniques that allowed for rapid-fire rounds of development and refactoring, while always retaining a “production-ready” state. He sighed deeply and then launched into an astonishing diatribe against Database Administrators, much of his frustration directed toward Oracle DBAs, in particular. In broad strokes, he painted the picture of a brave new deployment philosophy being frustratingly shackled by the relational database, and by especially by the attitudes of the guardians of these databases. DBAs, he said, shunned change and “still favored tools I’d have been embarrassed to use in the ’80′s“. DBAs, Oracle DBAs especially, were more attached to their vendor than to their employer, since the former was the primary source of their career longevity and spectacular remuneration. He contended that someone could produce the best IDE or tool in the world for Oracle DBAs and yet none of them would give a stuff, unless it happened to come from the “mother ship”. I sat blinking in astonishment at the speaker’s vehemence, and glanced around nervously. Nobody in the audience disagreed, and a few nodded in assent. Although the primary target of the outburst was the Oracle DBA, it made me wonder. Are we who work with SQL Server, database professionals or merely SQL Server fanbois? Do DBAs, in general, have an image problem? Is it a good career-move to be seen to be holding onto a particular product by the whites of our knuckles, to the exclusion of all else? If we seek a broad, open-minded, knowledge of our chosen technology, the database, and are blessed with merely mortal powers of learning, then we like standards. Vendors of RDBMSs generally don’t conform to standards by instinct, but by customer demand. Microsoft has made great strides to adopt the international SQL Standards, where possible, thanks to considerable lobbying by the community. The implementation of Window functions is a great example. There is still work to do, though. SQL Server, for example, has an unusable version of the Information Schema. One cast-iron rule of any RDBMS is that we must be able to query the metadata using the same language that we use to query the data, i.e. SQL, and we do this by running queries against the INFORMATION_SCHEMA views. Developers who’ve attempted to apply a standard query that works on MySQL, or some other database, but doesn’t produce the expected results on SQL Server are advised to shun the Standards-based approach in favor of the vendor-specific one, using the catalog views. The argument behind this is sound and well-documented, and of course we all use those catalog views, out of necessity. And yet, as database professionals, committed to supporting the best databases for the business, whatever they are now and in the future, surely our heart should sink somewhat when we advocate a vendor specific approach, to a developer struggling with something as simple as writing a guard clause. And when we read messages on the Microsoft documentation informing us that we shouldn’t rely on INFORMATION_SCHEMA to identify reliably the schema of an object, in SQL Server!

    Read the article

  • "cloud architecture" concepts in a system architecture diagrams

    - by markus
    If you design a distributed application for easy scale-out, or you just want to make use of any of the new “cloud computing” offerings by Amazon, Google or Microsoft, there are some typical concepts or components you usually end up using: distributed blob storage (aka S3) asynchronous, durable message queues (aka SQS) non-Relational-/non-transactional databases (like SimpleDB, Google BigTable, Azure SQL Services) distributed background worker pool load-balanced, edge-service processes handling user requests (often virtualized) distributed caches (like memcached) CDN (content delivery network like Akamai) Now when it comes to design and sketch an architecture that makes use of such patterns, are there any commonly used symbols I could use? Or even a download with some cool Visio stencils? :) It doesn’t have to be a formal system like UML but I think it would be great if there were symbols that everyone knows and understands, like we have commonly used shapes for databases or a documents, for example. I think it would be important to not mix it up with traditional concepts like a normal file system (local or network server/SAN), or a relational database. Simply speaking, I want to be able to draw some conclusions about an application’s scalability or data consistency issues by just looking at the system architecture overview diagram. Update: Thank you very much for your answers. I like the idea of putting a small "cloud symbol" on the traditional symbols. However I leave this thread open just in case someone will find specific symbols (maybe in a book or so) - or uploaded some pimped up Visio stencils ;)

    Read the article

  • Why use short-circuit code?

    - by Tim Lytle
    Related Questions: Benefits of using short-circuit evaluation, Why would a language NOT use Short-circuit evaluation?, Can someone explain this line of code please? (Logic & Assignment operators) There are questions about the benefits of a language using short-circuit code, but I'm wondering what are the benefits for a programmer? Is it just that it can make code a little more concise? Or are there performance reasons? I'm not asking about situations where two entities need to be evaluated anyway, for example: if($user->auth() AND $model->valid()){ $model->save(); } To me the reasoning there is clear - since both need to be true, you can skip the more costly model validation if the user can't save the data. This also has a (to me) obvious purpose: if(is_string($userid) AND strlen($userid) > 10){ //do something }; Because it wouldn't be wise to call strlen() with a non-string value. What I'm wondering about is the use of short-circuit code when it doesn't effect any other statements. For example, from the Zend Application default index page: defined('APPLICATION_PATH') || define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application')); This could have been: if(!defined('APPLICATION_PATH')){ define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application')); } Or even as a single statement: if(!defined('APPLICATION_PATH')) define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application')); So why use the short-circuit code? Just for the 'coolness' factor of using logic operators in place of control structures? To consolidate nested if statements? Because it's faster?

    Read the article

  • Hidden features of Perl?

    - by Adam Bellaire
    What are some really useful but esoteric language features in Perl that you've actually been able to employ to do useful work? Guidelines: Try to limit answers to the Perl core and not CPAN Please give an example and a short description Hidden Features also found in other languages' Hidden Features: (These are all from Corion's answer) C# Duff's Device Portability and Standardness Quotes for whitespace delimited lists and strings Aliasable namespaces Java Static Initalizers JavaScript Functions are First Class citizens Block scope and closure Calling methods and accessors indirectly through a variable Ruby Defining methods through code PHP Pervasive online documentation Magic methods Symbolic references Python One line value swapping Ability to replace even core functions with your own functionality Other Hidden Features: Operators: The bool quasi-operator The flip-flop operator Also used for list construction The ++ and unary - operators work on strings The repetition operator The spaceship operator The || operator (and // operator) to select from a set of choices The diamond operator Special cases of the m// operator The tilde-tilde "operator" Quoting constructs: The qw operator Letters can be used as quote delimiters in q{}-like constructs Quoting mechanisms Syntax and Names: There can be a space after a sigil You can give subs numeric names with symbolic references Legal trailing commas Grouped Integer Literals hash slices Populating keys of a hash from an array Modules, Pragmas, and command-line options: use strict and use warnings Taint checking Esoteric use of -n and -p CPAN overload::constant IO::Handle module Safe compartments Attributes Variables: Autovivification The $[ variable tie Dynamic Scoping Variable swapping with a single statement Loops and flow control: Magic goto for on a single variable continue clause Desperation mode Regular expressions: The \G anchor (?{}) and '(??{})` in regexes Other features: The debugger Special code blocks such as BEGIN, CHECK, and END The DATA block New Block Operations Source Filters Signal Hooks map (twice) Wrapping built-in functions The eof function The dbmopen function Turning warnings into errors Other tricks, and meta-answers: cat files, decompressing gzips if needed Perl Tips See Also: Hidden features of C Hidden features of C# Hidden features of C++ Hidden features of Java Hidden features of JavaScript Hidden features of Ruby Hidden features of PHP Hidden features of Python

    Read the article

  • What scalability problems have you solved using a NoSQL data store?

    - by knorv
    NoSQL refers to non-relational data stores that break with the history of relational databases and ACID guarantees. Popular open source NoSQL data stores include: Cassandra (tabular, written in Java, used by Facebook, Twitter, Digg, Rackspace, Mahalo and Reddit) CouchDB (document, written in Erlang, used by Engine Yard and BBC) Dynomite (key-value, written in C++, used by Powerset) HBase (key-value, written in Java, used by Bing) Hypertable (tabular, written in C++, used by Baidu) Kai (key-value, written in Erlang) MemcacheDB (key-value, written in C, used by Reddit) MongoDB (document, written in C++, used by Sourceforge, Github, Electronic Arts and NY Times) Neo4j (graph, written in Java, used by Swedish Universities) Project Voldemort (key-value, written in Java, used by LinkedIn) Redis (key-value, written in C, used by Engine Yard, Github and Craigslist) Riak (key-value, written in Erlang, used by Comcast and Mochi Media) Ringo (key-value, written in Erlang, used by Nokia) Scalaris (key-value, written in Erlang, used by OnScale) ThruDB (document, written in C++, used by JunkDepot.com) Tokyo Cabinet/Tokyo Tyrant (key-value, written in C, used by Mixi.jp (Japanese social networking site)) I'd like to know about specific problems you - the SO reader - have solved using data stores and what NoSQL data store you used. Questions: What scalability problems have you used NoSQL data stores to solve? What NoSQL data store did you use? What database did you use before switching to a NoSQL data store? I'm looking for first-hand experiences, so please do not answer unless you have that.

    Read the article

  • Extracting images from a PDF

    - by sagar
    My Query I want to extract only images from a PDF document, using Objective-C in an iPhone Application. My Efforts I have gone through the info on this link, which has details regarding different operators on PDF documents. I also studied this document from Apple about PDF parsing with Quartz. I also went through the entire PDF reference document from the Adobe site. According to that document, for each image there are the following operators: q Q BI EI I have created a table to get the image: myTable = CGPDFOperatorTableCreate(); CGPDFOperatorTableSetCallback(myTable, "q", arrayCallback2); CGPDFOperatorTableSetCallback(myTable, "TJ", arrayCallback); CGPDFOperatorTableSetCallback(myTable, "Tj", stringCallback); I use this method to get the image: void arrayCallback2(CGPDFScannerRef inScanner, void *userInfo) { // THIS DOESN'T WORK // CGPDFStreamRef stream; // represents a sequence of bytes // if (CGPDFDictionaryGetStream (d, "BI", &stream)){ // CGPDFDataFormat t=CGPDFDataFormatJPEG2000; // CFDataRef data = CGPDFStreamCopyData (stream, &t); // } } This method is called for the operator "q", but I don't know how to extract an image from it. What should be the solution for extracting the images from the PDF documents? Thanks in advance for your kind help.

    Read the article

  • Pros & Cons of Google App Engine

    - by Rishi
    Pros & Cons of Google App Engine [An Updated List 21st Aug 09] Help me Compile a List of all the Advantages & Disadvantages of Building an Application on the Google App Engine Pros: 1) No Need to buy Servers or Server Space (no maintenance). 2) Makes solving the problem of scaling much easier. Cons: 1) Locked into Google App Engine ?? 2)Developers have read-only access to the filesystem on App Engine. 3)App Engine can only execute code called from an HTTP request (except for scheduled background tasks). 4)Users may upload arbitrary Python modules, but only if they are pure-Python; C and Pyrex modules are not supported. 5)App Engine limits the maximum rows returned from an entity get to 1000 rows per Datastore call. 6)Java applications may only use a subset (The JRE Class White List) of the classes from the JRE standard edition. 7)Java applications cannot create new threads. Known Issues!! http://code.google.com/p/googleappengine/issues/list Hard limits Apps per developer - 10 Time per request - 30 sec Files per app - 3,000 HTTP response size - 10 MB Datastore item size - 1 MB Application code size - 150 MB Pro or Con? App Engine's infrastructure removes many of the system administration and development challenges of building applications to scale to millions of hits. Google handles deploying code to a cluster, monitoring, failover, and launching application instances as necessary. While other services let users install and configure nearly any *NIX compatible software, App Engine requires developers to use Python or Java as the programming language and a limited set of APIs. Current APIs allow storing and retrieving data from a BigTable non-relational database; making HTTP requests; sending e-mail; manipulating images; and caching. Most existing Web applications can't run on App Engine without modification, because they require a relational database.

    Read the article

  • Regex problem in Java in code sample

    - by JaneNY
    I have job with regex in my expressions: example !(FA1_A.i & FA1_M.i) I have operators ! ( ) & | operands have names [a-zA-Z_]*.[a-zA-Z_] I wrote in Java to split on tokens, but it doesn't split on operators and operands If should be !, (, FA1_A.i, &, FA1_m.i, ) . Can anybody tell me what is wrong ? String stringOpеrator = "([!|&()])"; String stringOperand = "(([a-zA-Z_]*)\\.([a-zA-Z_]*))"; String reg=stringOpеrator+"|"+stringOperand; Pattern pattern = Pattern.compile(reg); Matcher m = pattern.matcher(expression); // System.out.println("func: " + function + " item: " + item); while (m.find()) { int a=m.start(); int b=m.end(); String test=expression.substring(m.start(), m.end()); String g=test; tokens.add(new Token(expression.substring(m.start() , m.end()))); //m = pattern.matcher(expression); }

    Read the article

  • Where are the real risks in network security?

    - by Barry Brown
    Anytime a username/password authentication is used, the common wisdom is to protect the transport of that data using encryption (SSL, HTTPS, etc). But that leaves the end points potentially vulnerable. Realistically, which is at greater risk of intrusion? Transport layer: Compromised via wireless packet sniffing, malicious wiretapping, etc. Transport devices: Risks include ISPs and Internet backbone operators sniffing data. End-user device: Vulnerable to spyware, key loggers, shoulder surfing, and so forth. Remote server: Many uncontrollable vulnerabilities including malicious operators, break-ins resulting in stolen data, physically heisting servers, backups kept in insecure places, and much more. My gut reaction is that although the transport layer is relatively easy to protect via SSL, the risks in the other areas are much, much greater, especially at the end points. For example, at home my computer connects directly to my router; from there it goes straight to my ISPs routers and onto the Internet. I would estimate the risks at the transport level (both software and hardware) at low to non-existant. But what security does the server I'm connected to have? Have they been hacked into? Is the operator collecting usernames and passwords, knowing that most people use the same information at other websites? Likewise, has my computer been compromised by malware? Those seem like much greater risks. What do you think?

    Read the article

  • Can I compare a template variable to an integer in App Engine templates?

    - by matt b
    Using Django templates in Google App Engine (on Python), is it possible to compare a template variable to an integer in an {% if %} block? views.py: class MyHandler(webapp.RequestHandler): def get(self): foo_list = db.GqlQuery(...) ... template_values['foos'] = foo_list template_values['foo_count'] = len(foo_list) handler.response.out.write(template.render(...)) My template: {% if foo_count == 1 %} There is one foo. {% endif %} This blows up with 'if' statement improperly formatted. What I was attempting to do in my template was build a simple if/elif/else tree to be grammatically correct to be able to state #foo_count == 0: There are no foos. #foo_count == 1: There is one foo. #else: There are {{ foos|length }} foos. Browsing the Django template documents (this link provided in the GAE documentation appears to be for versions of Django far newer than what is supported on GAE), it appears as if I can only actually use boolean operators (if in fact boolean operators are supported in this older version of Django) with strings or other template variables. Is it not possible to compare variables to integers or non-strings with Django templates? I'm sure there is an easy way to workaround this - built up the message string on the Python side rather than within the template - but this seems like such a simple operation you ought to be able to handle in a template. It sounds like I should be switching to a more advanced templating engine, but as I am new to Django (templates or any part of it), I'd just like some confirmation first.

    Read the article

  • Why put a DAO layer over a persistence layer (like JDO or Hibernate)

    - by Todd Owen
    Data Access Objects (DAOs) are a common design pattern, and recommended by Sun. But the earliest examples of Java DAOs interacted directly with relational databases -- they were, in essence, doing object-relational mapping (ORM). Nowadays, I see DAOs on top of mature ORM frameworks like JDO and Hibernate, and I wonder if that is really a good idea. I am developing a web service using JDO as the persistence layer, and am considering whether or not to introduce DAOs. I foresee a problem when dealing with a particular class which contains a map of other objects: public class Book { // Book description in various languages, indexed by ISO language codes private Map<String,BookDescription> descriptions; } JDO is clever enough to map this to a foreign key constraint between the "BOOKS" and "BOOKDESCRIPTIONS" tables. It transparently loads the BookDescription objects (using lazy loading, I believe), and persists them when the Book object is persisted. If I was to introduce a "data access layer" and write a class like BookDao, and encapsulate all the JDO code within this, then wouldn't this JDO's transparent loading of the child objects be circumventing the data access layer? For consistency, shouldn't all the BookDescription objects be loaded and persisted via some BookDescriptionDao object (or BookDao.loadDescription method)? Yet refactoring in that way would make manipulating the model needlessly complicated. So my question is, what's wrong with calling JDO (or Hibernate, or whatever ORM you fancy) directly in the business layer? Its syntax is already quite concise, and it is datastore-agnostic. What is the advantage, if any, of encapsulating it in Data Access Objects?

    Read the article

  • What's the false operator in C# good for?

    - by Jakub Šturc
    There are two weird operators in C#: the true operator the false operator If I understand this right these operators can be used in types which I want to use instead of a boolean expression and where I don't want to provide an implicit conversion to bool. Let's say I have a following class: public class MyType { public readonly int Value; public MyType(int value) { Value = value; } public static bool operator true (MyType mt) { return mt.Value > 0; } public static bool operator false (MyType mt) { return mt.Value < 0; } } So I can write the following code: MyType mTrue = new MyType(100); MyType mFalse = new MyType(-100); MyType mDontKnow = new MyType(0); if (mTrue) { // Do something. } while (mFalse) { // Do something else. } do { // Another code comes here. } while (mDontKnow) However for all the examples above only the true operator is executed. So what's the false operator in C# good for? Note: More examples can be found here, here and here.

    Read the article

  • Is there an XML XQuery interface to existing XML files?

    - by xsaero00
    My company is in education industry and we use XML to store course content. We also store some course related information (mostly metainfo) in relational database. Right now we are in the process of switching from our proprietary XML Schema to DocBook 5. Along with the switch we want to move course related information from database to XML files. The reason for this is to have all course data in one place and to put it under Subversion. However, we would like to keep flexibility of the relational database and be able to easily extract specific information about a course from an XML document. XQuery seems to be up to the task so I was researching databases that supports it but so far could not find what I needed. What I basically want, is to have my XML files in a certain directory structure and then on top of this I would like to have a system that would index my files and let me select anything out of any file using XQuery. This way I can have "my cake and eat it too": I will have XQuery interface and still keep my files in plain text and versioned. Is there anything out there at least remotely resembling to what I want? If you think that what I an asking for is nonsense please make an alternative suggestion. On the related note: What XML Databases (preferably native and open source) do you have experience with and what would you recommend?

    Read the article

  • Can I compare a template variable to an integer in Django/App Engine templates?

    - by matt b
    Using Django templates in Google App Engine (on Python), is it possible to compare a template variable to an integer in an {% if %} block? views.py: class MyHandler(webapp.RequestHandler): def get(self): foo_list = db.GqlQuery(...) ... template_values['foos'] = foo_list template_values['foo_count'] = len(foo_list) handler.response.out.write(template.render(...)) My template: {% if foo_count == 1 %} There is one foo. {% endif %} This blows up with 'if' statement improperly formatted. What I was attempting to do in my template was build a simple if/elif/else tree to be grammatically correct to be able to state #foo_count == 0: There are no foos. #foo_count == 1: There is one foo. #else: There are {{ foos|length }} foos. Browsing the Django template documents (this link provided in the GAE documentation appears to be for versions of Django far newer than what is supported on GAE), it appears as if I can only actually use boolean operators (if in fact boolean operators are supported in this older version of Django) with strings or other template variables. Is it not possible to compare variables to integers or non-strings with Django templates? I'm sure there is an easy way to workaround this - built up the message string on the Python side rather than within the template - but this seems like such a simple operation you ought to be able to handle in a template. It sounds like I should be switching to a more advanced templating engine, but as I am new to Django (templates or any part of it), I'd just like some confirmation first.

    Read the article

  • Simple Database normalization question...

    - by user365531
    Hi all, I have a quick question regarding a database that I am designing and making sure it is normalized... I have a customer table, with a primary key of customerId. It has a StatusCode column that has a code which reflects the customers account status ie. 1 = Open, 2 = Closed, 3 = Suspended etc... Now I would like to have another field in the customer table that flags whether the account is allowed to be suspended or not... certain customers will be automatically suspended if they break there trading terms... others not... so the relevant table fields will be as so: Customers (CustomerId(PK):StatusCode:IsSuspensionAllowed) Now both fields are dependent on the primary key as you can not determine the status or whether suspensions are allowed on a particular customer unless you know the specific customer, except of course when the IsSuspensionAllowed field is set to YES, the the customer should never have a StatusCode of 3 (Suspended). It seems from the above table design it is possible for this to happen unless a check contraint is added to my table. I can't see how another table could be added to the relational design to enforce this though as it's only in the case where IsSuspensionAllowed is set to YES and StatusCode is set to 3 when the two have a dependence on each other. So after my long winded explanation my question is this: Is this a normalization problem and I'm not seeing a relational design that will enforce this... or is it actually just a business rule that should be enforced with a check contraint and the table is in fact still normalized. Cheers, Steve

    Read the article

  • implementing cryptographic algorithms, specifically the key expansion part

    - by masseyc
    Hey, recently I picked up a copy of Applied Cryptography by Bruce Schneier and it's been a good read. I now understand how several algorithms outlined in the book work, and I'd like to start implementing a few of them in C. One thing that many of the algorithms have in common is dividing an x-bit key, into several smaller y-bit keys. For example, blowfish's key, X, is 64-bits, but you are required to break it up into two 32-bit halves; Xl and Xr. This is where I'm getting stuck. I'm fairly decent with C, but I'm not the strongest when it comes to bitwise operators and the like. After some help on IRC, I managed to come up with these two macros: #define splitup(a, b, c) {b = a >> 32; c = a & 0xffffffff; } #define combine(a, b, c) {a = (c << 32) | a;} Where a is 64 bits and b and c are 32 bits. However, the compiler warns me about the fact that I'm shifting a 32 bit variable by 32 bits. My questions are these: what's bad about shifting a 32-bit variable 32 bits? I'm guessing it's undefined, but these macros do seem to be working. Also, would you suggest I go about this another way? As I said, I'm fairly familiar with C, but bitwise operators and the like still give me a headache.

    Read the article

  • What database systems should an startup company consider?

    - by Am
    Right now I'm developing the prototype of a web application that aggregates large number of text entries from a large number of users. This data must be frequently displayed back and often updated. At the moment I store the content inside a MySQL database and use NHibernate ORM layer to interact with the DB. I've got a table defined for users, roles, submissions, tags, notifications and etc. I like this solution because it works well and my code looks nice and sane, but I'm also worried about how MySQL will perform once the size of our database reaches a significant number. I feel that it may struggle performing join operations fast enough. This has made me think about non-relational database system such as MongoDB, CouchDB, Cassandra or Hadoop. Unfortunately I have no experience with either. I've read some good reviews on MongoDB and it looks interesting. I'm happy to spend the time and learn if one turns out to be the way to go. I'd much appreciate any one offering points or issues to consider when going with none relational dbms?

    Read the article

< Previous Page | 34 35 36 37 38 39 40 41 42 43 44 45  | Next Page >