Search Results

Search found 66013 results on 2641 pages for 'big data analytics'.

Page 17/2641 | < Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24  | Next Page >

  • Question about server usage, big community platform

    - by Json
    I’m working on a community platform writen in PHP, MySQL. I have some questions about the server usage maybe someone can help me out. The community is based on JQuery with many ajax requests to update content. It makes 5 - 10 AJAX(Json, GET, POST) requests every 5 seconds, the requests fetch user data like user notifications and messages by doing mySQL queries. I wonder how a server will handle this when there are for more than 5000 users online. Then it will be 50.000 requests every 5 seconds, what kind of server you need to handle this? Or maybe even more, when there are 15.000 users online, 150.000 requests every 5 seconds. My webserver have the following specs. Xeon Quad 2048MB 5000GB traffic Will it be good enough, and for how many users? Anyone can help me out or know where to find such information, like make a calculation?

    Read the article

  • Replicate a big, dense Windows volume over a WAN -- too big for DFS-R

    - by Jesse
    I've got a server with a LOT of small files -- many millions files, and over 1.5 TB of data. I need a decent backup strategy. Any filesystem-based backup takes too long -- just enumerating which files need to be copied takes a day. Acronis can do a disk image in 24 hours, but fails when it tries to do a differential backup the next day. DFS-R won't replicate a volume with this many files. I'm starting to look at Double Take, which seems to be able to do continuous replication. Are there other solutions that can do continuous replication at a block or sector level -- not file-by-file over a WAN?

    Read the article

  • Oracle Data Integration 12c: Perspectives of Industry Experts, Customers and Partners

    - by Irem Radzik
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 As you may have seen from our recent blog posts on Oracle Data Integrator 12c and Oracle GoldenGate 12c, we are very excited to share with you the great new features the 12c release brings to Oracle’s data integration solutions. And, fortunately we are not alone in this sentiment. Since the press announcement October 17th, which incorporates our customers' and experts' testimonials, we have seen positive comments in leading technology publications and social media as well. Here are some examples: In CIO and PCWorld you can find Joab Jackson’s article, Oracle Data Integrator 12c ready for real-time analysis, where wrote about the tight integration between Oracle Data Integrator and Oracle GoldenGate . He noted “Heeding the call from enterprise customers who clamor for more immediacy in their data-driven reports, Oracle has updated its data-integration software portfolio so that it can more rapidly deliver data to data warehouses and analysis applications.” Integration Developer News’ Vance McCarthy wrote the article Oracle Ships ‘Future Proofs’ Integration Tools for Traditional, Cloud, Big Data, Real-Time Projects and mentioned that “Oracle Data Integrator 12c and Oracle GoldenGate 12c sport a wide range of improvements to let devs more easily deliver data integration for cloud, analytics, big data and other new projects that leverage multiple datasets for business.“ InformationWeek’s Doug Henschen gave a great overview to several key features including the new flow-based UI in Oracle Data Integrator. Doug said “Oracle Data Integrator 12c introduces a complete makeover of the job-building experience, while real-time oriented GoldenGate 12c introduces performance gains “. In Database Trends and Applications’ article Oracle Strengthens Data Integration with Release of Oracle Data Integrator 12c and Oracle GoldenGate 12c highlighted the productivity aspect of the new solution with his remarks: “tight integration between Oracle Data Integrator 12c and Oracle GoldenGate 12c enables developers to leverage Oracle GoldenGate’s low overhead, real-time change data capture completely within the Oracle Data Integrator Studio without additional training”. We are also thrilled about what our customers and partners have to say about our products and the new release. And we are equally excited to share those perspectives with you in our upcoming launch video webcast on November 12th. SolarWorld Industries America’s Senior Database Manager, Russ Toyama will join our executives in our studio in Redwood Shores to discuss GoldenGate’s core benefits and the new release, while Surren Partharb, CTO of Strategic Technology Services for BT, and Mark Rittman, CTO of Rittman Mead, will provide their comments via the interviews conducted in the UK. This interactive panel discussion in the video webcast will unveil the new release with the expertise of our development executives and the great insight from our customers and partners. In addition, our product experts will be available online to answer chat questions. This is really a great opportunity to learn how Oracle's data integration offering has changed the integration and replication technology space with the new release, and established itself as the new leader. If you have not registered for this free event yet, you can do so via this link. We will run the live event at 8am PT/4pm GMT, followed by a replay of the event with live chat for Q&A  at 10am PT/6pm GMT. The replay will be available on-demand for those who register but cannot attend either session on November 12th. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Times New Roman","serif"; mso-fareast-font-family:"Times New Roman";}

    Read the article

  • Data Structure for Small Number of Agents in a Relatively Big 2D World

    - by Seçkin Savasçi
    I'm working on a project where we will implement a kind of world simulation where there is a square 2D world. Agents live on this world and make decisions like moving or replicating themselves based on their neighbor cells(world=grid) and some extra parameters(which are not based on the state of the world). I'm looking for a data structure to implement such a project. My concerns are : I will implement this 3 times: sequential, using OpenMP, using MPI. So if I can use the same structure that will be quite good. The first thing comes up is keeping a 2D array for the world and storing agent references in it. And simulate the world for each time slice by checking every cell in each iteration and further processing if an agents is found in the cell. The downside is what if I have 1000x1000 world and only 5 agents in it. It will be an overkill for both sequential and parallel versions to check each cell and look for possible agents in them. I can use quadtree and store agents in it, but then how can I get the information about neighbor cells then? Please let me know if I should elaborate more.

    Read the article

  • Filtering your offices IPs from Google Analytics when each has a dynamic IP?

    - by leeand00
    I found the documentation for filtering IPs from Google Analytics, but the address of the several locations of our company all have dynamic IP addresses that change every 30 days from what I'm told. I know from working with Dynamic DNS that the provider usually gives you a script that you configure your router to run when it's IP address changes or when it is restarted, which passes the new IP address to the DDNS server. I'm wondering if there might be a way to write or use a preexisting script to do the same thing with the Google Analytics API.

    Read the article

  • How a .NET Programmer learn Big Data/Hadoop? [on hold]

    - by Smith Pascal Jr.
    I have been ASP.NET developer for sometime now and I have been reading a lot about Big Data- Hadoop and its future as to how it is the next technology in IT and how it would be useful to create million of jobs in US and elsewhere in the world. Now since Hadoop is an open source big data tool which is managed by Apache Server Foundation Group, I'm assuming I have to be well aware of JAVA - Correct me if I'm wrong. Moreover, How a .NET programmer can learn Big Data and its related technologies and can work professionally full time into this technology? What challenges and opportunities does a .NET professional face while changing the technology platform? Please advice. Thanks

    Read the article

  • Using Google Webmaster & Analytics, what data to look at to improve website performance?

    - by Rob
    Using data from Google Analytics and Webmaster tools, what data should I be looking at to improve my websites performance? I want to improve the SEO, usability and just general performance of my website. EDIT: It's a portfolio website that we've done the initial SEO for, also optimised all images etc and made the site as fast as possible. What kind of things should I be looking out for in the analytics and webmaster data to improve performance for both the SEO and each individual page.

    Read the article

  • Google Analytics: How can I traffic and referrals from iPad applications?

    - by kayaker243
    In Google Analytics, there is extensive information on the mobile device, version and browser version. However, this doesn't seem to go beyond the mobile browser. I would like to determine which application is responsible for visits to my site. Specifically, I want to know how many visits are coming from zite. http://www.handsetdetection.com/properties/vendormodel/Apple/iPad/page:4 seems to indicate this information is probably available, where/does Google Analytics expose this?

    Read the article

  • What is an easy way to see how often recently added pages are viewed in google analytics?

    - by cboettig
    Google Analytics makes it very easy to see the number of views of the most-viewed pages, but I cannot figure out how to see the number of views a particular page has received, or the number of views of recently added pages (e.g. blog posts). Is it possible to sort the pageviews list by date the page was added? Can this be done without having to externally create a list of recent pages and use the analytics API?

    Read the article

  • Posterous instructions for adding Google Analytics do not cover the code snippet to be pasted in the <head>

    - by Kit
    The Posterous instructions for adding Google Analytics ends with pasting the Google Analytics Domain ID into the settings page. However, the instructions given by Google tell me to paste some JavaScript code into the <head>. How do I get around with this? Do I need to paste the JavaScript code? If I do need to paste the JavaScript code, can I just paste it into the custom HTML/CSS style specification of my Posterous Space?

    Read the article

  • Data Aggregation of CSV files java

    - by royB
    I have k csv files (5 csv files for example), each file has m fields which produce a key and n values. I need to produce a single csv file with aggregated data. I'm looking for the most efficient solution for this problem, speed mainly. I don't think by the way that we will have memory issues. Also I would like to know if hashing is really a good solution because we will have to use 64 bit hashing solution to reduce the chance for a collision to less than 1% (we are having around 30000000 rows per aggregation). For example file 1: f1,f2,f3,v1,v2,v3,v4 a1,b1,c1,50,60,70,80 a3,b2,c4,60,60,80,90 file 2: f1,f2,f3,v1,v2,v3,v4 a1,b1,c1,30,50,90,40 a3,b2,c4,30,70,50,90 result: f1,f2,f3,v1,v2,v3,v4 a1,b1,c1,80,110,160,120 a3,b2,c4,90,130,130,180 algorithm that we thought until now: hashing (using concurentHashTable) merge sorting the files DB: using mysql or hadoop or redis. The solution needs to be able to handle Huge amount of data (each file more than two million rows) a better example: file 1 country,city,peopleNum england,london,1000000 england,coventry,500000 file 2: country,city,peopleNum england,london,500000 england,coventry,500000 england,manchester,500000 merged file: country,city,peopleNum england,london,1500000 england,coventry,1000000 england,manchester,500000 The key is: country,city. This is just an example, my real key is of size 6 and the data columns are of size 8 - total of 14 columns. We would like that the solution will be the fastest in regard of data processing.

    Read the article

  • How to present a stable data model in a public API that allows internal data structures to be changed without breaking the public view of the data?

    - by Max Palmer
    I am in the process of developing an application that allows users to write C# scripts. These scripts allow users to call selected methods and to access and manipulate data in a document. This works well, however, in the development version, scripts access the document's (internal) data structures directly. This means that if we were to change the internal data model/structure, there is a good chance that someone's script will no longer compile. We obviously want to prevent this breaking change from happening, but still want to allow the user to write sensible C# code (whilst not restricting how we develop our internal data model as a result). We therefore need to decouple our scripting API and its data structures from our internal methods and data structures. We've a few ideas as to how we might allow the user to access a what is effectively a stable public version of the document's internal data*, but I wanted to throw the question out there to someone who might have some real experience of this problem. NB our internal document's data structure is quite complex and it could be quite difficult to wrap. We know we want to expose as little as possible in our public API, especially as once it's out there, it's out there for good. Can anyone help? How do scripting languages / APIs decouple their public API and data structures from their internal data structures? Is there no real alternative to having to write a complex interaction layer? If we need to do this, what's a good approach or pattern for wrapping complex data structures that include nested objects, including collections? I've looked at the API facade pattern, which looks like it's trying to address these kinds of issues, but are there alternatives? *One idea is to build a data facade that is kept stable across versions of our application. The facade exposes a set of facade data objects that are used in the script code. These maintain backwards compatibility and wrap access to our internal document's data model.

    Read the article

  • Why Cornell University Chose Oracle Data Masking

    - by Troy Kitch
    One of the eight Ivy League schools, Cornell University found itself in the unfortunate position of having to inform over 45,000 University community members that their personal information had been breached when a laptop was stolen. To ensure this wouldn’t happen again, Cornell took steps to ensure that data used for non-production purposes is de-identified with Oracle Data Masking. A recent podcast highlights why organizations like Cornell are choosing Oracle Data Masking to irreversibly de-identify production data for use in non-production environments. Organizations often copy production data, that contains sensitive information, into non-production environments so they can test applications and systems using “real world” information. Data in non-production has increasingly become a target of cyber criminals and can be lost or stolen due to weak security controls and unmonitored access. Similar to production environments, data breaches in non-production environments can cost millions of dollars to remediate and cause irreparable harm to reputation and brand. Cornell’s applications and databases help carry out the administrative and academic mission of the university. They are running Oracle PeopleSoft Campus Solutions that include highly sensitive faculty, student, alumni, and prospective student data. This data is supported and accessed by a diverse set of developers and functional staff distributed across the university. Several years ago, Cornell experienced a data breach when an employee’s laptop was stolen.  Centrally stored backup information indicated there was sensitive data on the laptop. With no way of knowing what the criminal intended, the university had to spend significant resources reviewing data, setting up service centers to handle constituent concerns, and provide free credit checks and identity theft protection services—all of which cost money and took time away from other projects. To avoid this issue in the future Cornell came up with several options; one of which was to sanitize the testing and training environments. “The project management team was brought in and they developed a project plan and implementation schedule; part of which was to evaluate competing products in the market-space and figure out which one would work best for us.  In the end we chose Oracle’s solution based on its architecture and its functionality.” – Tony Damiani, Database Administration and Business Intelligence, Cornell University The key goals of the project were to mask the elements that were identifiable as sensitive in a consistent and efficient manner, but still support all the previous activities in the non-production environments. Tony concludes,  “What we saw was a very minimal impact on performance. The masking process added an additional three hours to our refresh window, but it was well worth that time to secure the environment and remove the sensitive data. I think some other key points you can keep in mind here is that there was zero impact on the production environment. Oracle Data Masking works in non-production environments only. Additionally, the risk of exposure has been significantly reduced and the impact to business was minimal.” With Oracle Data Masking organizations like Cornell can: Make application data securely available in non-production environments Prevent application developers and testers from seeing production data Use an extensible template library and policies for data masking automation Gain the benefits of referential integrity so that applications continue to work Listen to the podcast to hear the complete interview.  Learn more about Oracle Data Masking by registering to watch this SANS Institute Webcast and view this short demo.

    Read the article

  • Removing Barriers to Create Effective Data Models

    After years of creating and maintaining data models, I have started to notice common barriers that decrease the accuracy and usefulness of models. In my opinion, the main causes of these barriers are the lack of knowledge and communication from within a company. The lack of knowledge in regards to data models or data modeling can take many forms. Company Culture Knowledge Whether documented or undocumented, existing business rules of a company can affect how data is modeled. For example, if a company only allows 1 assigned person per customer to be able to manipulate a customer’s record then then a data model that includes an associated table that joins customers and employee’s would be unneeded because that would allow for the possibility of multiple employees to handle a customer because of the potential for a many to many relationship between Customers and Employees. Technical Knowledge Depending on the data modeler’s proficiency in modeling data they can inadvertently cause issues and/or complications with a design without even noticing. It is important that companies share data modeling responsibilities so that the models are developed from multiple perspectives of a system, company and the original problem.  In addition, the tools that a company selects to create data models can also affect the accuracy of the model if designer are not familiar with the tools or the tools are too complex to use for the designer. Existing System Knowledge In order for a data modeler to model data for an existing system so that new changes can be applied to a system then they need to at least know the basic concepts of a system so that they can work within it. This will promote reusability of data and prevent the chance of duplicating data. Project Knowledge This should be pretty obvious, but it is very hard to create an accurate data model without knowing what data needs to be modeled. I have always found it strange that I have been asked to start modeling data prior to a client formalizing any requirements. Usually when this happens I have to make several iterations to a model, and the client still does not know exactly what they want.  In addition additional issues can arise when certain stakeholders of a project are not consulted prior to the design or after the project is over because it can cause miss understandings and confusion by the end user as well as possibly not solving the original problem for which a project is intended to solve. One common thread between each type of knowledge is that they can all be avoided through the use of good communication. For example, if a modeler is new to a company then they should ask older employees about any business specific rules that may be documented or undocumented that must be applied to projects in general. Furthermore, if a modeler is not really familiar with a specific data modeling software then they need to speak up and ask for help form other employees or their manager. This will not only help the modeler in the project, but also help them in future projects that they do for the company. Additionally, if a project is not clearly defined prior to a data modeler being assigned the modeling project then it is their responsibility to communicate with the other stakeholders to clarify any part of a project that is unclear so that the data model that is created is accurately aligned with a project.

    Read the article

  • Goal Tracking data seems to be inaccurate?

    - by Khuram Malik
    I setup some Goal Tracking about one week ago. I had multiple goals in one set. The goal itself was the "send" button being pressed on the callback form (i did that by pushing a pageview to Google Analytics everytime the send button is pressed) For each goal, i listed the first step as a required step. So for example, the ILR Page was step 1 and set as required and the goal was "/CallbackFormFilled" Looking at the stats a week later i'm getting some very inflated numbers especially when comparing them to my manually filled excel spreadsheet and i'm struggling to understand the cause of this behaviour. I'm unable to attach screenshots unfortunately since my StackExchange account for this site is brand new My own thoughts My own thoughts were that maybe its because i have setup multiple goals with the same end goal URL, but i thought that was a valid setup since i want to track multiple routes so to speak(?) I've disabled all other goals for now to confirm this, but im waiting for stats to come in as i write this. I also wonder if the contact form im using in Wordpress is causing a problem, but i've simply added one javascript line on the send button that pushes a pageview so not sure if that should cause an issue. Here is a link to setting up analytics on this contact form plugin in wordpress for reference: (see javascript action hook section) - http://ideasilo.wordpress.com/2009/05/31/contact-form-7-1-10/

    Read the article

  • OS Analytics - Deep Dive Into Your OS

    - by Eran_Steiner
    Enterprise Manager Ops Center provides a feature called "OS Analytics". This feature allows you to get a better understanding of how the Operating System is being utilized. You can research the historical usage as well as real time data. This post will show how you can benefit from OS Analytics and how it works behind the scenes. We will have a call to discuss this blog - please join us!Date: Thursday, November 1, 2012Time: 11:00 am, Eastern Daylight Time (New York, GMT-04:00)1. Go to https://oracleconferencing.webex.com/oracleconferencing/j.php?ED=209833067&UID=1512092402&PW=NY2JhMmFjMmFh&RT=MiMxMQ%3D%3D2. If requested, enter your name and email address.3. If a password is required, enter the meeting password: oracle1234. Click "Join". To join the teleconference:Call-in toll-free number:       1-866-682-4770  (US/Canada)      Other countries:                https://oracle.intercallonline.com/portlets/scheduling/viewNumbers/viewNumber.do?ownerNumber=5931260&audioType=RP&viewGa=true&ga=ONConference Code:       7629343#Security code:            7777# Here is quick summary of what you can do with OS Analytics in Ops Center: View historical charts and real time value of CPU, memory, network and disk utilization Find the top CPU and Memory processes in real time or at a certain historical day Determine proper monitoring thresholds based on historical data View Solaris services status details Drill down into a process details View the busiest zones if applicable Where to start To start with OS Analytics, choose the OS asset in the tree and click the Analytics tab. You can see the CPU utilization, Memory utilization and Network utilization, along with the current real time top 5 processes in each category (click the image to see a larger version):  In the above screen, you can click each of the top 5 processes to see a more detailed view of that process. Here is an example of one of the processes: One of the cool things is that you can see the process tree for this process along with some port binding and open file descriptors. On Solaris machines with zones, you get an extra level of tabs, allowing you to get more information on the different zones: This is a good way to see the busiest zones. For example, one zone may not take a lot of CPU but it can consume a lot of memory, or perhaps network bandwidth. To see the detailed Analytics for each of the zones, simply click each of the zones in the tree and go to its Analytics tab. Next, click the "Processes" tab to see real time information of all the processes on the machine: An interesting column is the "Target" column. If you configured Ops Center to work with Enterprise Manager Cloud Control, then the two products will talk to each other and Ops Center will display the correlated target from Cloud Control in this table. If you are only using Ops Center - this column will remain empty. Next, if you view a Solaris machine, you will have a "Services" tab: By default, all services will be displayed, but you can choose to display only certain states, for example, those in maintenance or the degraded ones. You can highlight a service and choose to view the details, where you can see the Dependencies, Dependents and also the location of the service log file (not shown in the picture as you need to scroll down to see the log file). The "Threshold" tab is particularly helpful - you can view historical trends of different monitored values and based on the graph - determine what the monitoring values should be: You can ask Ops Center to suggest monitoring levels based on the historical values or you can set your own. The different colors in the graph represent the current set levels: Red for critical, Yellow for warning and Blue for Information, allowing you to quickly see how they're positioned against real data. It's important to note that when looking at longer periods, Ops Center smooths out the data and uses averages. So when looking at values such as CPU Usage, try shorter time frames which are more detailed, such as one hour or one day. Applying new monitoring values When first applying new values to monitored attributes - a popup will come up asking if it's OK to get you out of the current Monitoring Policy. This is OK if you want to either have custom monitoring for a specific machine, or if you want to use this current machine as a "Gold image" and extract a Monitoring Policy from it. You can later apply the new Monitoring Policy to other machines and also set it as a default Monitoring Profile. Once you're done with applying the different monitoring values, you can review and change them in the "Monitoring" tab. You can also click the "Extract a Monitoring Policy" in the actions pane on the right to save all the new values to a new Monitoring Policy, which can then be found under "Plan Management" -> "Monitoring Policies". Visiting the past Under the "History" tab you can "go back in time". This is very helpful when you know that a machine was busy a few hours ago (perhaps in the middle of the night?), but you were not around to take a look at it in real time. Here's a view into yesterday's data on one of the machines: You can see an interesting CPU spike happening at around 3:30 am along with some memory use. In the bottom table you can see the top 5 CPU and Memory consumers at the requested time. Very quickly you can see that this spike is related to the Solaris 11 IPS repository synchronization process using the "pkgrecv" command. The "time machine" doesn't stop here - you can also view historical data to determine which of the zones was the busiest at a given time: Under the hood The data collected is stored on each of the agents under /var/opt/sun/xvm/analytics/historical/ An "os.zip" file exists for the main OS. Inside you will find many small text files, named after the Epoch time stamp in which they were taken If you have any zones, there will be a file called "guests.zip" containing the same small files for all the zones, as well as a folder with the name of the zone along with "os.zip" in it If this is the Enterprise Controller or the Proxy Controller, you will have folders called "proxy" and "sat" in which you will find the "os.zip" for that controller The actual script collecting the data can be viewed for debugging purposes as well: On Linux, the location is: /opt/sun/xvmoc/private/os_analytics/collect On Solaris, the location is /opt/SUNWxvmoc/private/os_analytics/collect If you would like to redirect all the standard error into a file for debugging, touch the following file and the output will go into it: # touch /tmp/.collect.stderr   The temporary data is collected under /var/opt/sun/xvm/analytics/.collectdb until it is zipped. If you would like to review the properties for the Analytics, you can view those per each agent in /opt/sun/n1gc/lib/XVM.properties. Find the section "Analytics configurable properties for OS and VSC" to view the Analytics specific values. I hope you find this helpful! Please post questions in the comments below. Eran Steiner

    Read the article

  • Big DataData Mining with Hive – What is Hive? – What is HiveQL (HQL)? – Day 15 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the operational database in Big Data Story. In this article we will understand what is Hive and HQL in Big Data Story. Yahoo started working on PIG (we will understand that in the next blog post) for their application deployment on Hadoop. The goal of Yahoo to manage their unstructured data. Similarly Facebook started deploying their warehouse solutions on Hadoop which has resulted in HIVE. The reason for going with HIVE is because the traditional warehousing solutions are getting very expensive. What is HIVE? Hive is a datawarehouseing infrastructure for Hadoop. The primary responsibility is to provide data summarization, query and analysis. It  supports analysis of large datasets stored in Hadoop’s HDFS as well as on the Amazon S3 filesystem. The best part of HIVE is that it supports SQL-Like access to structured data which is known as HiveQL (or HQL) as well as big data analysis with the help of MapReduce. Hive is not built to get a quick response to queries but it it is built for data mining applications. Data mining applications can take from several minutes to several hours to analysis the data and HIVE is primarily used there. HIVE Organization The data are organized in three different formats in HIVE. Tables: They are very similar to RDBMS tables and contains rows and tables. Hive is just layered over the Hadoop File System (HDFS), hence tables are directly mapped to directories of the filesystems. It also supports tables stored in other native file systems. Partitions: Hive tables can have more than one partition. They are mapped to subdirectories and file systems as well. Buckets: In Hive data may be divided into buckets. Buckets are stored as files in partition in the underlying file system. Hive also has metastore which stores all the metadata. It is a relational database containing various information related to Hive Schema (column types, owners, key-value data, statistics etc.). We can use MySQL database over here. What is HiveSQL (HQL)? Hive query language provides the basic SQL like operations. Here are few of the tasks which HQL can do easily. Create and manage tables and partitions Support various Relational, Arithmetic and Logical Operators Evaluate functions Download the contents of a table to a local directory or result of queries to HDFS directory Here is the example of the HQL Query: SELECT upper(name), salesprice FROM sales; SELECT category, count(1) FROM products GROUP BY category; When you look at the above query, you can see they are very similar to SQL like queries. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Pig. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Big IP F5 Basics (show run/show conf/term len 0)

    - by PP
    I've tried to find the basics in a Big IP manual but it seems to me the device is marketed towards GUI users only. Meanwhile I want to write a few scripts to automate tasks on the load balancer. Namely: how do I turn off more - when I issue a command I want the output to stream out without waiting for me to press a key for the next page how do I show the running configuration (I think list all is the way to do it but cannot find it documented anywhere) Thanks!

    Read the article

  • Google Analytics se rapproche des Google Apps, des scripts génèrent automatiquement des rapports d'analyse d'audience

    Google Analytics se rapproche des Google Apps Des scripts génèrent automatiquement des rapports d'analyse d'audience avec les outils bureautiques en ligne Dans un billet du blog officiel de Google Analytics, Google vient d'annoncer la sortie d'une nouvelle fonctionnalité dans Google Analytics qui permettra d'intégrer plus facilement les données du service d'analyse d'audience Web dans un script Google Apps, une fonctionnalité qui d'après Google était très demandée par les utilisateurs. Les produits concernés sont Google Docs, Sites et Spreadsheets. On peut voir comme illustré ci-dessus, comment les données de Google Analytics ont été capturées par un script Google Apps et affichées dans ...

    Read the article

  • Behind the Code: The Analytics Mobile SDK

    Behind the Code: The Analytics Mobile SDK The new Google Analytics Mobile SDK empowers Android and iOS developers to effectively collect user engagement data from their applications to measure active user counts, user geography, new feature adoption and many other useful metrics. Join Analytics Developer Program Engineer Andrew Wales and Analytics Software Engineer Jim Cotugno for an unprecedented look behind the code at the goals, design, and architecture of the new SDK to learn more about what it takes to build world-class technology. From: GoogleDevelopers Views: 0 1 ratings Time: 30:00 More in Science & Technology

    Read the article

  • How often do you use data structures (ie Binary Trees, Linked Lists) in your jobs/side projects?

    - by Chris2021
    It seems to me that, for everyday use, more primitive data structures like arrays get the job done just as well as a binary tree would. My question is how common is to use these structures when writing code for projects at work or projects that you pursue in your free time? I understand the better insertion time/deletion time/sorting time for certain structures but would that really matter that much if you were working with a relatively small amount of data?

    Read the article

  • Why would you use data structures (ie Binary Trees, Linked Lists) in your jobs/side projects? [closed]

    - by Chris2021
    It seems to me that, for everyday use, more primitive data structures like arrays get the job done just as well as a binary tree would. My question is how common is to use these structures when writing code for projects at work or projects that you pursue in your free time? I understand the better insertion time/deletion time/sorting time for certain structures but would that really matter that much if you were working with a relatively small amount of data?

    Read the article

  • Analytics: Test events not showing up - how to troubleshoot?

    - by David Parks
    I've got 3 profiles: Master, Raw Data, and Test, on the Test profile I have no filters configured. I want to test using some events. I created a local HTML file as shown below to generate some test data that I could play with in Analytics. But the events never showed up in Analytics. I wonder what I might be doing wrong? Is the lack of a domain an issue maybe? <html><head></head><body>Login_popup_complete_Facebook <script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-28554309-1']); _gaq.push(['_trackPageview']); _gaq.push(['_trackEvent', 'Login popup completed', 'Facebook']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script> </body></html>

    Read the article

  • Google Analytics async=true seems wrong in the Google documentation?

    - by leeand00
    In the Google Analytics async example, they state that in order to include more than one tracker, you need to setup your pages for asyncrous tracking, and they do so using the following code: <script type="text/javascript"> _gaq.push( ['_setAccount', 'UA-XXXXX-1'], ['_trackPageview'], ['b._setAccount', 'UA-XXXXX-2'], ['b._trackPageview'] ); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script> The second tracker is not receiving any results. After looking at my tracking codes to ensure they are correct, I noticed that the ga.async = true statement is specified differently most of the time and is never set to a value of true, it's often set to async but never true. Could this be stopping my Analytics data from posting to the second tracker? or might it be something else? Also what calls should I look for in the Net tab in firebug to ensure that GA is being called when the page loads?

    Read the article

  • Is there an elegant way to track multiple domains under separate accounts with google analytics?

    - by J_M_J
    I have a situation where a content management system uses the same template for multiple websites with different domain names and I can't make a separate template for each. However, each website needs to be tracked with Google analytics. Would this be appropriate to track each domain like this by putting in some conditional code? And would this be robust enough not to break? Is there a more elegant way to do this? <script type="text/javascript"> var _gaq = _gaq || []; switch (location.hostname){ case 'www.aaa.com': _gaq.push(['_setAccount', 'UA-xxxxxxx-1']); break; case 'www.bbb.com': _gaq.push(['_setAccount', 'UA-xxxxxxx-2']); break; case 'www.ccc.com': _gaq.push(['_setAccount', 'UA-xxxxxxx-3']); break; } _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(ga); })(); </script> Just to be clear, each website is a separate domain name and must be tracked separately, NOT different domains with same pages on one analytics profile.

    Read the article

< Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24  | Next Page >