Search Results

Search found 65999 results on 2640 pages for 'large data volumes'.

Page 25/2640 | < Previous Page | 21 22 23 24 25 26 27 28 29 30 31 32  | Next Page >

  • are there any useful datasets available on the web for data mining?

    - by niko
    Hi, Does anyone know any good resource where example (real) data can be downloaded for experimenting statistics and machine learning techniques such as decision trees etc? Currently I am studying machine learning techniques and it would be very helpful to have real data for evaluating the accuracy of various tools. If anyone knows any good resource (perhaps csv, xls files or any other format) I would be very thankful for a suggestion.

    Read the article

  • Big Data: Size isn’t everything

    - by Simon Elliston Ball
    Big Data has a big problem; it’s the word “Big”. These days, a quick Google search will uncover terabytes of negative opinion about the futility of relying on huge volumes of data to produce magical, meaningful insight. There are also many clichéd but correct assertions about the difficulties of correlation versus causation, in massive data sets. In reading some of these pieces, I begin to understand how climatologists must feel when people complain ironically about “global warming” during snowfall. Big Data has a name problem. There is a lot more to it than size. Shape, Speed, and…err…Veracity are also key elements (now I understand why Gartner and the gang went with V’s instead of S’s). The need to handle data of different shapes (Variety) is not new. Data developers have always had to mold strange-shaped data into our reporting systems, integrating with semi-structured sources, and even straying into full-text searching. However, what we lacked was an easy way to add semi-structured and unstructured data to our arsenal. New “Big Data” tools such as MongoDB, and other NoSQL (Not Only SQL) databases, or a graph database like Neo4J, fill this gap. Still, to many, they simply introduce noise to the clean signal that is their sensibly normalized data structures. What about speed (Velocity)? It’s not just high frequency trading that generates data faster than a single system can handle. Many other applications need to make trade-offs that traditional databases won’t, in order to cope with high data insert speeds, or to extract quickly the required information from data streams. Unfortunately, many people equate Big Data with the Hadoop platform, whose batch driven queries and job processing queues have little to do with “velocity”. StreamInsight, Esper and Tibco BusinessEvents are examples of Big Data tools designed to handle high-velocity data streams. Again, the name doesn’t do the discipline of Big Data any favors. Ultimately, though, does analyzing fast moving data produce insights as useful as the ones we get through a more considered approach, enabled by traditional BI? Finally, we have Veracity and Value. In many ways, these additions to the classic Volume, Velocity and Variety trio acknowledge the criticism that without high-quality data and genuinely valuable outputs then data, big or otherwise, is worthless. As a discipline, Big Data has recognized this, and data quality and cleaning tools are starting to appear to support it. Rather than simply decrying the irrelevance of Volume, we need as a profession to focus how to improve Veracity and Value. Perhaps we should just declare the ‘Big’ silent, embrace these new data tools and help develop better practices for their use, just as we did the good old RDBMS? What does Big Data mean to you? Which V gives your business the most pain, or the most value? Do you see these new tools as a useful addition to the BI toolbox, or are they just enabling a dangerous trend to find ghosts in the noise?

    Read the article

  • Know your Data Lineage

    - by Simon Elliston Ball
    An academic paper without the footnotes isn’t an academic paper. Journalists wouldn’t base a news article on facts that they can’t verify. So why would anyone publish reports without being able to say where the data has come from and be confident of its quality, in other words, without knowing its lineage. (sometimes referred to as ‘provenance’ or ‘pedigree’) The number and variety of data sources, both traditional and new, increases inexorably. Data comes clean or dirty, processed or raw, unimpeachable or entirely fabricated. On its journey to our report, from its source, the data can travel through a network of interconnected pipes, passing through numerous distinct systems, each managed by different people. At each point along the pipeline, it can be changed, filtered, aggregated and combined. When the data finally emerges, how can we be sure that it is right? How can we be certain that no part of the data collection was based on incorrect assumptions, that key data points haven’t been left out, or that the sources are good? Even when we’re using data science to give us an approximate or probable answer, we cannot have any confidence in the results without confidence in the data from which it came. You need to know what has been done to your data, where it came from, and who is responsible for each stage of the analysis. This information represents your data lineage; it is your stack-trace. If you’re an analyst, suspicious of a number, it tells you why the number is there and how it got there. If you’re a developer, working on a pipeline, it provides the context you need to track down the bug. If you’re a manager, or an auditor, it lets you know the right things are being done. Lineage tracking is part of good data governance. Most audit and lineage systems require you to buy into their whole structure. If you are using Hadoop for your data storage and processing, then tools like Falcon allow you to track lineage, as long as you are using Falcon to write and run the pipeline. It can mean learning a new way of running your jobs (or using some sort of proxy), and even a distinct way of writing your queries. Other Hadoop tools provide a lot of operational and audit information, spread throughout the many logs produced by Hive, Sqoop, MapReduce and all the various moving parts that make up the eco-system. To get a full picture of what’s going on in your Hadoop system you need to capture both Falcon lineage and the data-exhaust of other tools that Falcon can’t orchestrate. However, the problem is bigger even that that. Often, Hadoop is just one piece in a larger processing workflow. The next step of the challenge is how you bind together the lineage metadata describing what happened before and after Hadoop, where ‘after’ could be  a data analysis environment like R, an application, or even directly into an end-user tool such as Tableau or Excel. One possibility is to push as much as you can of your key analytics into Hadoop, but would you give up the power, and familiarity of your existing tools in return for a reliable way of tracking lineage? Lineage and auditing should work consistently, automatically and quietly, allowing users to access their data with any tool they require to use. The real solution, therefore, is to create a consistent method by which to bring lineage data from these data various disparate sources into the data analysis platform that you use, rather than being forced to use the tool that manages the pipeline for the lineage and a different tool for the data analysis. The key is to keep your logs, keep your audit data, from every source, bring them together and use the data analysis tools to trace the paths from raw data to the answer that data analysis provides.

    Read the article

  • NSURLConnection receives data even if no data was thrown back

    - by Anna Fortuna
    Let me explain my situation. Currently, I am experimenting long-polling using NSURLConnection. I found this and I decided to try it. What I do is send a request to the server with a timeout interval of 300 secs. (or 5 mins.) Here is a code snippet: NSURL *url = [NSURL URLWithString:urlString]; NSURLRequest *request = [NSURLRequest requestWithURL:url cachePolicy:NSURLCacheStorageAllowedInMemoryOnly timeoutInterval:300]; NSData *data = [NSURLConnection sendSynchronousRequest:request returningResponse:&resp error:&err]; Now I want to test if the connection will "hold" the request if no data was thrown back from the server, so what I did was this: if (data != nil) [self performSelectorOnMainThread:@selector(dataReceived:) withObject:data waitUntilDone:YES]; And the function dataReceived: looks like this: - (void)dataReceived:(NSData *)data { NSLog(@"DATA RECEIVED!"); NSString *string = [NSString stringWithUTF8String:[data bytes]]; NSLog(@"THE DATA: %@", string); } Server-side, I created a function that will return a data once it fits the arguments and returns none if nothing fits. Here is a snippet of the PHP function: function retrieveMessages($vardata) { if (!empty($vardata)) { $result = check_data($vardata) //check_data is the function which returns 1 if $vardata //fits the arguments, and 0 if it fails to fit if ($result == 1) { $jsonArray = array('Data' => $vardata); echo json_encode($jsonArray); } } } As you can see, the function will only return data if the $result is equal to 1. However, even if the function returns nothing, NSURLConnection will still perform the function dataReceived: meaning the NSURLConnection still receives data, albeit an empty one. So can anyone help me here? How will I perform long-polling using NSURLConnection? Basically, I want to maintain the connection as long as no data is returned. So how will I do it? NOTE: I am new to PHP, so if my code is wrong, please point it out so I can correct it.

    Read the article

  • How to maintain an ordered table with Core Data (or SQL) with insertions/deletions?

    - by Jean-Denis Muys
    This question is in the context of Core Data, but if I am not mistaken, it applies equally well to a more general SQL case. I want to maintain an ordered table using Core Data, with the possibility for the user to: reorder rows insert new lines anywhere delete any existing line What's the best data model to do that? I can see two ways: 1) Model it as an array: I add an int position property to my entity 2) Model it as a linked list: I add two one-to-one relations, next and previous from my entity to itself 1) makes it easy to sort, but painful to insert or delete as you then have to update the position of all objects that come after 2) makes it easy to insert or delete, but very difficult to sort. In fact, I don't think I know how to express a Sort Descriptor (SQL ORDER BY clause) for that case. Now I can imagine a variation on 1): 3) add an int ordering property to the entity, but instead of having it count one-by-one, have it count 100 by 100 (for example). Then inserting is as simple as finding any number between the ordering of the previous and next existing objects. The expensive renumbering only has to occur when the 100 holes have been filled. Making that property a float rather than an int makes it even better: it's almost always possible to find a new float midway between two floats. Am I on the right track with solution 3), or is there something smarter?

    Read the article

  • Compact data structure for storing a large set of integral values

    - by Odrade
    I'm working on an application that needs to pass around large sets of Int32 values. The sets are expected to contain ~1,000,000-50,000,000 items, where each item is a database key in the range 0-50,000,000. I expect distribution of ids in any given set to be effectively random over this range. The operations I need on the set are dirt simple: Add a new value Iterate over all of the values. There is a serious concern about the memory usage of these sets, so I'm looking for a data structure that can store the ids more efficiently than a simple List<int>or HashSet<int>. I've looked at BitArray, but that can be wasteful depending on how sparse the ids are. I've also considered a bitwise trie, but I'm unsure how to calculate the space efficiency of that solution for the expected data. A Bloom Filter would be great, if only I could tolerate the false negatives. I would appreciate any suggestions of data structures suitable for this purpose. I'm interested in both out-of-the-box and custom solutions. EDIT: To answer your questions: No, the items don't need to be sorted By "pass around" I mean both pass between methods and serialize and send over the wire. I clearly should have mentioned this. There could be a decent number of these sets in memory at once (~100).

    Read the article

  • How can I scrape specific data from a website

    - by Stoney
    I'm trying to scrape data from a website for research. The urls are nicely organized in an example.com/x format, with x as an ascending number and all of the pages are structured in the same way. I just need to grab certain headings and a few numbers which are always in the same locations. I'll then need to get this data into structured form for analysis in Excel. I have used wget before to download pages, but I can't figure out how to grab specific lines of text. Excel has a feature to grab data from the web (Data-From Web) but from what I can see it only allows me to download tables. Unfortunately, the data I need is not in tables.

    Read the article

  • How should I architect my Model and Data Access layer objects in my website?

    - by Robin Winslow
    I've been tasked with designing Data layer for a website at work, and I am very interested in architecture of code for the best flexibility, maintainability and readability. I am generally acutely aware of the value in completely separating out my actual Models from the Data Access layer, so that the Models are completely naive when it comes to Data Access. And in this case it's particularly useful to do this as the Models may be built from the Database or may be built from a Soap web service. So it seems to me to make sense to have Factories in my data access layer which create Model objects. So here's what I have so far (in my made-up pseudocode): class DataAccess.ProductsFromXml extends DataAccess.ProductFactory {} class DataAccess.ProductsFromDatabase extends DataAccess.ProductFactory {} These then get used in the controller in a fashion similar to the following: var xmlProductCreator = DataAccess.ProductsFromXml(xmlDataProvider); var databaseProductCreator = DataAccess.ProductsFromXml(xmlDataProvider); // Returns array of Product model objects var XmlProducts = databaseProductCreator.Products(); // Returns array of Product model objects var DbProducts = xmlProductCreator.Products(); So my question is, is this a good structure for my Data Access layer? Is it a good idea to use a Factory for building my Model objects from the data? Do you think I've misunderstood something? And are there any general patterns I should read up on for how to write my data access objects to create my Model objects?

    Read the article

  • Simple Backup Strategy for Amazon EC2 instances / volumes?

    - by minerj
    You have entered Introductory Backups for Amazon EC2 EBS-backed Windows Images 010... I have been browsing my brains out to find a simple backup strategy for our single windows 2008 server running SharePoint Services. This is an EBS-backed image of one server with one data volume. I don’t need anything exotic. I only need a “daily” backup (losing a day’s worth of data is not catastrophic). We have created and saved an EBS backed AMI image (Windows 2008) we are comfortable using. We started off making backups by simply creating a new EBS AMI image. This is really simple, but the running server is put offline during the first 10 – 15 minutes of creating the image – not ideal. The standard way of creating backups would seem to be creating snapshots of volumes attached to a running instance. Again it’s pretty simple and the server remains usable during the snapshot generation. The apparent Catch-22 is that you can’t simply launch a new instance directly from a snapshot. I know how to bundle a running instance to S3 storage and then register the AMI from the S3 bucket. This does allow me to capture a backup of a running instance and, if the running instance is lost, register the AMI from the S3 bucket and launch the new AMI to recover the instance, but this seems really convoluted and it seems ridiculous to have to juggle back and forth between the AWS Console and the S3 Organizer plug-in for Firefox to get this accomplished. (Please don't mention the command line approach, this is an 010 level course). From playing around with EBS-backed images, the following approach appears to work for me (all done within the AWS Console): 1.For your backups, simply snapshot the system volume (/dev/sda1) as needed. 2.If you lose your running instance, do the following: a.Create a new volume from your last snapshot backup b.Launch another instance of your starting AMI (must be EBS-backed) c.Stop this instance. d.Detach the existing system volume from the new stopped instance and discard. e.Attach the newly created volume as system volume (/dev/sda1) to the stopped instance. f.Re-start the new instance. I have tested this out a couple of times and it seems to work for me. Question: Is there anything wrong with this approach?

    Read the article

  • If OOP makes problems with large projects, what doesn't?

    - by osca
    I learned Python OOP at school. My (good in theory, bad in practice) informatics told us about how good OOP was for any purpose; Even/Especially for large projects. Now I don't have any experience with teamwork in software development (what a pity, I'd like to program in a team) and I don't know anything about scaling and large projects either. Since some time I'm reading more and more about that object-oriented programming has (many) disadvantages when it comes to really big and important projects/systems. I got a bit confused by that as I always thought that OOP helped you keep large amounts of code clean and structured. Now why should OOP be problematic in large projects? If it is, what would be better? Functional, Declarative/Imperative?

    Read the article

  • Calculating percentiles in Excel with "buckets" data instead of the data list itself

    - by G B
    I have a bunch of data in Excel that I need to get certain percentile information from. The problem is that instead of having the data set made up of each value, I instead have info on the number of or "bucket" data. For example, imagine that my actual data set looks like this: 1,1,2,2,2,2,3,3,4,4,4 The data set that I have is this: Value No. of occurrences 1 2 2 4 3 2 4 3 Is there an easy way for me to calculate percentile information (as well as the median) without having to explode the summary data out to full data set? (Once I did that, I know that I could just use the Percentile(A1:A5, p) function) This is important because my data set is very large. If I exploded the data out, I would have hundreds of thousands of rows and I would have to do it for a couple of hundred data sets. Help!

    Read the article

  • Using pow() for large number

    - by g4ur4v
    I am trying to solve a problem, a part of which requires me to calculate (2^n)%1000000007 , where n<=10^9. But my following code gives me output "0" even for input like n=99. Is there anyway other than having a loop which multilplies the output by 2 every time and finding the modulo every time (this is not I am looking for as this will be very slow for large numbers). #include<stdio.h> #include<math.h> #include<iostream> using namespace std; int main() { unsigned long long gaps,total; while(1) { cin>>gaps; total=(unsigned long long)powf(2,gaps)%1000000007; cout<<total<<endl; } }

    Read the article

  • Is it possible to store only a checksum of a large file in git?

    - by Andrew Grimm
    I'm a bioinformatician currently extracting normal-sized sequences from genomic files. Some genomic files are large enough that I don't want to put them into the main git repository, whereas I'm putting the extracted sequences into git. Is it possible to tell git "Here's a large file - don't store the whole file, just take its checksum, and let me know if that file is missing or modified." If that's not possible, I guess I'll have to either git-ignore the large files, or, as suggested in this question, store them in a submodule.

    Read the article

  • Rails - Displaying Large Set of Data in a Table / Start new column after X rows

    - by ChrisWesAllen
    Hi, I trying to display a large set of checkboxes in my rails app and didnt knwo the syntax for displaying like 15 rows then after starting a new column. I have a model with about 120 entries. Currently, I have it being displayed in the view as.... <% for interest in Interest.find(:all) %> <%= check_box_tag Blah Blah Blah %> <%= interest.name %> <% end %> How can I make it so it makes a table and after every 15 or so rows make a new column???

    Read the article

  • How do I setup a WCF Data Service with an ADO.NET Entity Entity Model in another assembly?

    - by lsb
    Hi! I have an ASP.NET 4.0 website that has an Entity Data Model hooked up to WCF Data Service. When the Service and Model are in the same assembly everything works. Unfortunately, when I move the Model to another "shared" assembly (and change the namespace) the service compiles but throws a 500 error when launched in a browser. The reason I want to have the Model in a common assembly (lets call it RiaTest.Shared) is that I want share common validation code between the client and service (by checking "Reuse types in referenced assemblies" in the Advanced tab of the Add Service Reference dialog). Anyway, I've spent a couple of hours on this to no avail so any help in the regard would be appreciated...

    Read the article

  • Open Data, Government and Transparency

    - by Tori Wieldt
    A new track at TDC (The Developer's Conference in Sao Paulo, Brazil) is titled Open Data. It deals with open data, government and transparency. Saturday will be a "transparency hacker day" where developers are invited to create applications using open data from the Brazilian government.  Alexandre Gomes, co-lead of the track, says "I want to inspire developers to become "Civic hackers:" developers who create apps to make society better." It is a chance for developers to do well and do good. There are many opportunities for developers, including monitoring government expenditures and getting citizens involved via social networks. The open data movement is growing worldwide. One initiative, the Open Government Partnership, is working to make government data easier to find and access. Making this data easily available means that with the right applications, it will be easier for people to make decisions and suggestions about government policies based on detailed information. Last April, the Open Government Partnership held its annual meeting in Brasilia, the capitol of Brazil. It was a great success showcasing the innovative work being done in open data by governments, civil societies and individuals around the world. For example, Bulgaria now publishes daily data on budget spending for all public institutions. Alexandre Gomes Explains Open Data At TDC, the Open Data track will include a presentation of examples of successful open data projects, an introduction to the semantic web, how to handle big data sets, techniques of data visualization, and how to design APIs.The other track lead is Christian Moryah Miranda, a systems analyst for the Brazilian Government's Ministry of Planning. "The Brazilian government wholeheartedly supports this effort. In order to make our data available to the public, it forces us to be more consistent with our data across ministries, and that's a good step forward for us," he said. He explained the government knows they cannot achieve everything they would like without help from the public. "It is not the government versus the people, rather citizens are partners with the government, and together we can achieve great things!" Miranda exclaimed. Saturday at TDC will be a "transparency hacker day" where developers will be invited to create applications using open data from the Brazilian government. Attendees are invited to pitch their ideas, work in small groups, and present their project at the end of the conference. "For example," Gomes said, "the Brazilian government just released the salaries of all government employees and I can't wait to see what developers can do with that." Resources Open Government Partnership  U.S. Government Open Data ProjectBrazilian Government Open Data ProjectU.K. Government Open Data Project 2012 International Open Government Data Conference 

    Read the article

  • Master Data Management and Cloud Computing

    - by david.butler(at)oracle.com
    Cloud Computing is all the rage these days. There are many reasons why this is so. But like its predecessor, Service Oriented Architecture, it can fall on hard times if the underlying data is left unmanaged. Master Data Management is the perfect Cloud companion. It can materially increase the chances for successful Cloud initiatives. In this blog, I'll review the nature of the Cloud and show how MDM fits in.   Here's the National Institute of Standards and Technology Cloud definition: •          Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction.   Cloud architectures have three main layers: applications or Software as a Service (SaaS), Platforms as a Service (PaaS), and Infrastructure as a Service (IaaS). SaaS generally refers to applications that are delivered to end-users over the Internet. Oracle CRM On Demand is an example of a SaaS application. Today there are hundreds of SaaS providers covering a wide variety of applications including Salesforce.com, Workday, and Netsuite. Oracle MDM applications are located in this layer of Oracle's On Demand enterprise Cloud platform. We call it Master Data as a Service (MDaaS). PaaS generally refers to an application deployment platform delivered as a service. They are often built on a grid computing architecture and include database and middleware. Oracle Fusion Middleware is in this category and includes the SOA and Data Integration products used to connect SaaS applications including MDM. Finally, IaaS generally refers to computing hardware (servers, storage and network) delivered as a service.  This typically includes the associated software as well: operating systems, virtualization, clustering, etc.    Cloud Computing benefits are compelling for a large number of organizations. These include significant cost savings, increased flexibility, and fast deployments. Cost advantages include paying for just what you use. This is especially critical for organizations with variable or seasonal usage. Companies don't have to invest to support peak computing periods. Costs are also more predictable and controllable. Increased agility includes access to the latest technology and experts without making significant up front investments.   While Cloud Computing is certainly very alluring with a clear value proposition, it is not without its challenges. An IDC survey of 244 IT executives/CIOs and their line-of-business (LOB) colleagues identified a number of issues:   Security - 74% identified security as an issue involving data privacy and resource access control. Integration - 61% found that it is hard to integrate Cloud Apps with in-house applications. Operational Costs - 50% are worried that On Demand will actually cost more given the impact of poor data quality on the rest of the enterprise. Compliance - 49% felt that compliance with required regulatory, legal and general industry requirements (such as PCI, HIPAA and Sarbanes-Oxley) would be a major issue. When control is lost, the ability of a provider to directly manage how and where data is deployed, used and destroyed is negatively impacted.  There are others, but I singled out these four top issues because Master Data Management, properly incorporated into a Cloud Computing infrastructure, can significantly ameliorate all of these problems. Cloud Computing can literally rain raw data across the enterprise.   According to fellow blogger, Mike Ferguson, "the fracturing of data caused by the adoption of cloud computing raises the importance of MDM in keeping disparate data synchronized."   David Linthicum, CTO Blue Mountain Labs blogs that "the lack of MDM will become more of an issue as cloud computing rises. We're moving from complex federated on-premise systems, to complex federated on-premise and cloud-delivered systems."    Left unmanaged, non-standard, inconsistent, ungoverned data with questionable quality can pollute analytical systems, increase operational costs, and reduce the ROI in Cloud and On-Premise applications. As cloud computing becomes more relevant, and more data, applications, services, and processes are moved out to cloud computing platforms, the need for MDM becomes ever more important. Oracle's MDM suite is designed to deal with all four of the above Cloud issues listed in the IDC survey.   Security - MDM manages all master data attribute privacy and resource access control issues. Integration - MDM pre-integrates Cloud Apps with each other and with On Premise applications at the data level. Operational Costs - MDM significantly reduces operational costs by increasing data quality, thereby improving enterprise business processes efficiency. Compliance - MDM, with its built in Data Governance capabilities, insures that the data is governed according to organizational standards. This facilitates rapid and accurate reporting for compliance purposes. Oracle MDM creates governed high quality master data. A unified cleansed and standardized data view is produced. The Oracle Customer Hub creates a single view of the customer. The Oracle Product Hub creates high quality product data designed to support all go-to-market processes. Oracle Supplier Hub dramatically reduces the chances of 'supplier exceptions'. Oracle Site Hub masters locations. And Oracle Hyperion Data Relationship Management masters financial reference data and manages enterprise hierarchies across operational areas from ERP to EPM and CRM to SCM. Oracle Fusion Middleware connects Cloud and On Premise applications to MDM Hubs and brings high quality master data to your enterprise business processes.   An independent analyst once said "Poor data quality is like dirt on the windshield. You may be able to drive for a long time with slowly degrading vision, but at some point, you either have to stop and clear the windshield or risk everything."  Cloud Computing has the potential to significantly degrade data quality across the enterprise over time. Deploying a Master Data Management solution prior to or in conjunction with a move to the Cloud can insure that the data flowing into the enterprise from the Cloud is clean and governed. This will in turn insure that expected returns on the investment in Cloud Computing will be realized.       Oracle MDM has proven its metal in this area and has the customers to back that up. In fact, I will be hosting a webcast on Tuesday, April 10th at 10 am PT with one of our top Cloud customers, the Church Pension Group. They have moved all mainline applications to a hosted model and use Oracle MDM to insure the master data is managed and cleansed before it is propagated to other cloud and internal systems. I invite you join Martin Hossfeld, VP, IT Operations, and Danette Patterson, Enterprise Data Manager as they review business drivers for MDM and hosted applications, how they did it, the benefits achieved, and lessons learned. You can register for this free webcast here.  Hope to see you there.

    Read the article

  • Volume group disappeared, LVs still available

    - by Ben
    I've run into an issue with my KVM host which runs VMs on a LVM volume. As of last night the logical volumes are no longer seen as such (I can't create snapshots of them even though I have been for months now). Running any scans all result in nothing being found: [root@apollo ~]# pvscan No matching physical volumes found [root@apollo ~]# vgscan Reading all physical volumes. This may take a while... No volume groups found root@apollo ~]# lvscan No volume groups found If I try restoring the VG conf backup from /etc/lvm/backups/vg0 I get the following error: [root@apollo ~]# vgcfgrestore -f /etc/lvm/backup/vg0 vg0 Couldn't find device with uuid 20zG25-H8MU-UQPf-u0hD-NftW-ngsC-mG63dt. Cannot restore Volume Group vg0 with 1 PVs marked as missing. Restore failed. /etc/lvm/backups/vg0 has the following for the physical volume: physical_volumes { pv0 { id = "20zG25-H8MU-UQPf-u0hD-NftW-ngsC-mG63dt" device = "/dev/sda5" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 4292870143 # 1.99902 Terabytes pe_start = 384 pe_count = 524031 # 1.99902 Terabytes } } fdisk -l /dev/sda shows the following: [root@apollo ~]# fdisk -l /dev/sda Disk /dev/sda: 6000.1 GB, 6000069312512 bytes 64 heads, 32 sectors/track, 5722112 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000188b7 Device Boot Start End Blocks Id System /dev/sda1 2 32768 33553408 82 Linux swap / Solaris /dev/sda2 32769 33280 524288 83 Linux /dev/sda3 33281 1081856 1073741824 83 Linux /dev/sda4 1081857 3177984 2146435072 85 Linux extended /dev/sda5 1081857 3177984 2146435071+ 8e Linux LVM The server is running a 4 disk HW RAID10 which seems perfectly healthy according to megacli and smartd. The only odd message in /var/log/messages is the following which shows up every couple of hours: Jun 10 09:41:57 apollo udevd[527]: failed to create queue file: No space left on device Output of df -h [root@apollo ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 1016G 119G 847G 13% / /dev/sda2 508M 67M 416M 14% /boot Does anyone have any ideas what to do next? The VMs are all running fine at the moment apart from not being able to snapshot them. Updated with extra info It's not a lack of inodes: [root@apollo ~]# df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda3 67108864 48066 67060798 1% / /dev/sda2 32768 47 32721 1% /boot pvs, vgs & lvs either output nothing or "No volume groups found".

    Read the article

  • Can anyone explain to me what problem Core Data solves?

    - by Curtis Sumpter
    Core Data seems to add a needless layer of complexity. If you want to save data created natively by the user in an app why not just use an object and then write the data all to SQLite or back to a server using a RESTful script if necessary. Android doesn't have Core Data (though if it has something similar I haven't seen it.). What the heck is the point of buggy CD except useless needless overhead for people who can't write SQL or CGI scripts?

    Read the article

  • Difference between "Data Binding'","Data Hiding","Data Wraping" and "Encapsulation"?

    - by krishna Chandra
    I have been studying the conpects of Object oriented programming. Still I am not able to distinguish between the following concepts of object oriented programming.. a) Data Binding b) Data Hiding c) Data Wrapping d) encapsulation e) Data Abstraction I have gone through a lot of books ,and I also search the difference in google. but still I am not able to make the difference between these? Could anyone please help me ?

    Read the article

  • Recover harddrive data

    - by gameshints
    I have a dell laptop that recently "died" (It would get the blue screen of death upon starting) and the hard drive would make a weird cyclic clicking noises. I wanted to see if I could use some tools on my linux machine to recover the data, so I plugged it into there. If I run "fdisk" I get: Disk /dev/sdb: 20.0 GB, 20003880960 bytes 64 heads, 32 sectors/track, 19077 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Disk identifier: 0x64651a0a Disk /dev/sdb doesn't contain a valid partition table Fine, the partition table is messed up. However if I run "testdisk" in attempt to fix the table, it freezes at this point, making the same cyclical clicking noises: Disk /dev/sdb - 20 GB / 18 GiB - CHS 19078 64 32 Analyse cylinder 158/19077: 00% I don't really care about the hard drive working again, and just the data, so I ran "gpart" to figure out where the partitions used to be. I got this: dev(/dev/sdb) mss(512) chs(19077/64/32)(LBA) #s(39069696) size(19077mb) * Warning: strange partition table magic 0x2A55. Primary partition(1) type: 222(0xDE)(UNKNOWN) size: 15mb #s(31429) s(63-31491) chs: (0/1/1)-(3/126/63)d (0/1/32)-(15/24/4)r hex: 00 01 01 00 DE 7E 3F 03 3F 00 00 00 C5 7A 00 00 Primary partition(2) type: 007(0x07)(OS/2 HPFS, NTFS, QNX or Advanced UNIX) (BOOT) size: 19021mb #s(38956987) s(31492-38988478) chs: (4/0/1)-(895/126/63)d (15/24/5)-(19037/21/31)r hex: 80 00 01 04 07 7E FF 7F 04 7B 00 00 BB 6F 52 02 So I tried to mount just to the old NTFS partition, but got an error: sudo mount -o loop,ro,offset=16123904 -t ntfs /dev/sdb /mnt/usb NTFS signature is missing. Ugh. Okay. But then I tried to get a raw data dump by running dd if=/dev/sdb of=/home/erik/brokenhd skip=31492 count=38956987 But the file got up to 59885568 bytes, and made the same cyclical clicking noises. Obviously there is a bad sector, but I don't know what to do about it! The data is still there... if I view that 57MB file in textpad... I can see raw data from files. How can I get my data back? Thanks for any suggestions, Solution: I was able to recover about 90% of my data: Froze harddrive in freezer Used Ddrescue to make a copy of the drive Since Ddrescue wasn't able to get enough of my drive to use testdisk to recover my partitions/file system, I ended up using photorec to recover most of my files

    Read the article

  • How to process large files in NetLogo? [closed]

    - by user65597
    I am running into problems in NetLogo with large *.csv / *.txt files. The documents can consist of about 1 million data sets and I need to read them (to eventually create a diagram based on the data). With the most straightforward source code, my program needs about 2 minutes to process these files. How should I approach reading such large data files faster in NetLogo? Is NetLogo even suitable for such tasks (as it seems to be designed more for teaching and learning)?

    Read the article

< Previous Page | 21 22 23 24 25 26 27 28 29 30 31 32  | Next Page >