large data volumes - Page 19

When does 'optimizing code' == 'structuring data'?

- by NewAlexandria

A recent article by ycombinator lists a comment with principles of a great programmer. #7. Good programmer: I optimize code. Better programmer: I structure data. Best programmer: What's the difference? Acknowledging subjective and contentious concepts - does anyone have a position on what this means? I do, but I'd like to edit this question later with my thoughts so-as not to predispose the answers.

Read the article

Data Education: Great Classes Coming to a City Near You

- by Adam Machanic

In case you haven't noticed, Data Education (the training company I started a couple of years ago) has expanded beyond the US northeast; we're currently offering courses with top trainers in both St. Louis and Chicago , as well as the Boston area. The courses are starting to fill up fast—not surprising when you consider we’re talking about experienced instructors like Kalen Delaney , Rob Farley , and Allan Hirt —but we have still have some room. We’re very excited about bringing the highest quality...(read more)

Read the article

Single Key Multiple Values Data Structure for one to many mapping

- by nijhawan.saurabh

Dictionaries are good, they are great to store Key / Value pairs but what if you want to store multiple values for a single key? Dictionaries would not allow duplicate keys. I came across a nice way to represent such a Data Structure using one of the Extension Method (ToLookup) present in System.Linq Namespace which converts an IEnumerable<T> to an ILookup<TKey, TElement>. Now, there are two parameters this method expects (The other overload expects 3 parameters): IEnumerable<TSource> - This list would contain the actual data. Func<TSource, TKey> keySelector - The Delegate which which computes the keys The method returns the following: ILookup<TKey, TElement> This DS would store Keys and multiple values along those keys. Let's see a small example: 12 using System; 13 using System.Collections.Generic; 14 using System.Linq; 15 16 /// <summary> 17 /// </summary> 18 internal class Program 19 { 20 #region Methods 21 22 /// <summary> 23 /// </summary> 24 /// <param name="args"> 25 /// The args. 26 /// </param> 27 private static void Main(string[] args) 28 { 29 // Create an array of strings. 30 var list = new List<string> { "IceCream1", "Chocolate Moose", "IceCream2" }; 31 32 // Generate a lookup Data Structure 33 ILookup<int, string> lookupDs = list.ToLookup(item => item.Length); 34 35 // Enumerate groupings. 36 foreach (var group in lookupDs) 37 { 38 foreach (string element in group) 39 { 40 Console.WriteLine(element); 41 } 42 } 43 } Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;}

Read the article

Extracting the Layout of all the Data Forms from the Relational Database

- by RahulS

Today I came across a question from one of our clients that: "what members are used on each data form WITHOUT having to go through the report generated out of our Planning app". We worked with client on this and reached to a simple query. All the form related information is stored in the following tables: HSP_FORM HSP_FORMOBJ_DEF HSP_FORMOBJ_DEF_MBR HSP_FORM_ATTRIBUTES HSP_FORM_CALCS HSP_FORM_DV_CONDITION HSP_FORM_DV_PM_RULE HSP_FORM_DV_RULE HSP_FORM_DV_USER_IN_PM_RULE HSP_FORM_LAYOUT HSP_FORM_MENUS HSP_FORM_VARIABLES If we want to retrieve just the members included, we can concentrate on: HSP_OBJECT to get the Object_ID for form, Object_Type is 7 for forms. (Ex: Select * from HSP_OBJECT where OBJECT_TYPE = 7) HSP_FORMOBJ_DEF Find the OBJDEF_ID for a particular form HSP_FORMOBJ_DEF_MBR Use the above OBJDEF_ID to find the members: Here the Mbr_ID is the Id of the member and Query_Type is the Function like Idesc, Level0 etc and Sequce is you sequence, And the final table we can use is HSP_FORM_LAYOUT: Layout_Type: 0->Pov 1-> Page, 2->Row, 3->Col, DIM_ID is the dimension ID and Ordinal is position. Here is the Query: SELECT HSP_OBJECT.OBJECT_NAME AS 'Form', HSP_OBJECT_2.OBJECT_NAME AS 'Dimension', HSP_OBJECT_1.OBJECT_NAME AS 'Member', HSP_FORMOBJ_DEF_MBR.QUERY_TYPE FROM <DatabaseName>.dbo.HSP_FORM_LAYOUT HSP_FORM_LAYOUT, <DatabaseName>.dbo.HSP_FORMOBJ_DEF HSP_FORMOBJ_DEF, <DatabaseName>.dbo.HSP_FORMOBJ_DEF_MBR HSP_FORMOBJ_DEF_MBR, <DatabaseName>.dbo.HSP_MEMBER HSP_MEMBER, <DatabaseName>.dbo.HSP_OBJECT HSP_OBJECT, <DatabaseName>.dbo.HSP_OBJECT HSP_OBJECT_1, <DatabaseName>.dbo.HSP_OBJECT HSP_OBJECT_2 WHERE HSP_OBJECT.OBJECT_ID = HSP_FORMOBJ_DEF.FORM_ID AND HSP_FORMOBJ_DEF_MBR.OBJDEF_ID = HSP_FORMOBJ_DEF.OBJDEF_ID AND HSP_MEMBER.MEMBER_ID = HSP_FORMOBJ_DEF_MBR.MBR_ID AND HSP_OBJECT_1.OBJECT_ID = HSP_MEMBER.MEMBER_ID AND HSP_OBJECT_2.OBJECT_ID = HSP_MEMBER.DIM_ID AND HSP_FORM_LAYOUT.DIM_ID = HSP_MEMBER.DIM_ID AND HSP_FORM_LAYOUT.FORM_ID = HSP_FORMOBJ_DEF.FORM_ID AND ((HSP_OBJECT.OBJECT_TYPE=7)) ORDER BY HSP_OBJECT.OBJECT_NAME Concentrate on Test1 data form and Actual Layout of it as follows: Corresponding Query_type for few of the functions: 9 for Idesc, 3 for Ancestors, -9 for ILvl0Des, 8 for Desc, 4 for IAncestors Its just a basic idea you can do lot on the basis of this. Cheers..!!! Rahul S. http://www.facebook.com/pages/HyperionPlanning/117320818374228

Read the article

Sorting data in the SSIS Pipeline (Video)

In this post I want to show a couple of ways to order the data that comes into the pipeline. a number of people have asked me about this primarily because there are a number of ways to do it but also because some components in the pipeline take sorted inputs. One of the methods I show is visually easy to understand and the other is less visual but potentially more performant.

Read the article

Application Crash cleared the content of the Folder

- by Ameya

Recently while working on the LinuxDC++ over the network the application crashed while downloading files. Now my Downloads folder which had at least 60-80GB of data is completely cleaned but the system is not reporting the available the correct free space. Is there way to restore the contents of the folder only as the solution available are for the whole partition. I just want to recover the contents from one folder.

Read the article

What's beyond c,c++ and data structure?

- by sagacious

I have learnt c and c++ programming languages.i have learnt data structure too. Now i'm confused what to do next?my aim is to be a good programmer. i want to go deeper into the field of programming and making the practical applications of what i have learnt. So,the question takes the form-what to do next?Or is there any site where i can see advantage of every language with it's features? sorry,if there's any language error and thanks in advance.

Read the article

Read large file into sqlite table in objective-C on iPhone

- by James Testa

I have a 2 MB file, not too large, that I'd like to put into an sqlite database so that I can search it. There are about 30K entries that are in CSV format, with six fields per line. My understanding is that sqlite on the iPhone can handle a database of this size. I have taken a few approaches but they have all been slow 30 s. I've tried: 1) Using C code to read the file and parse the fields into arrays. 2) Using the following Objective-C code to parse the file and put it into directly into the sqlite database: NSString *file_text = [NSString stringWithContentsOfFile: filePath usedEncoding: NULL error: NULL]; NSArray *lineArray = [file_text componentsSeparatedByString:@"\n"]; for(int k = 0; k < [lineArray count]; k++){ NSArray *parts = [[lineArray objectAtIndex:k] componentsSeparatedByString: @","]; NSString *field0 = [parts objectAtIndex:0]; NSString *field2 = [parts objectAtIndex:2]; NSString *field3 = [parts objectAtIndex:3]; NSString *loadSQLi = [[NSString alloc] initWithFormat: @"INSERT INTO TABLE (TABLE, FIELD0, FIELD2, FIELD3) VALUES ('%@', '%@', '%@');",field0, field2, field3]; if (sqlite3_exec (db_table, [loadSQLi UTF8String], NULL, NULL, &errorMsg) != SQLITE_OK) { sqlite3_close(db_table); NSAssert1(0, @"Error loading table: %s", errorMsg); } Am I missing something? Does anyone know of a fast way to get the file into a database? Or is it possible to translate the file into a sqlite format that can be read directly into sqlite? Or should I turn the file into a plist and load it into a Dictionary? Unfortunately I need to search on two of the fields, and I think a Dictionary can only have one key? Jim

Read the article

Managing large binary files with git

- by pi

Hi there. I am looking for opinions of how to handle large binary files on which my source code (web application) is dependent. We are currently discussing several alternatives: Copy the binary files by hand. Pro: Not sure. Contra: I am strongly against this, as it increases the likelihood of errors when setting up a new site/migrating the old one. Builds up another hurdle to take. Manage them all with git. Pro: Removes the possibility to 'forget' to copy a important file Contra: Bloats the repository and decreases flexibility to manage the code-base and checkouts/clones/etc will take quite a while. Separate repositories. Pro: Checking out/cloning the source code is fast as ever, and the images are properly archived in their own repository. Contra: Removes the simpleness of having the one and only git repository on the project. Surely introduces some other things I haven't thought about. What are your experiences/thoughts regarding this? Also: Does anybody have experience with multiple git repositories and managing them in one project? Update: The files are images for a program which generates PDFs with those files in it. The files will not change very often(as in years) but are very relevant to a program. The program will not work without the files. Update2: I found a really nice screencast on using git-submodule at GitCasts.

Read the article

How can I quickly parse large (>10GB) files?

- by Andrew

Hi - I have to process text files 10-20GB in size of the format: field1 field2 field3 field4 field5 I would like to parse the data from each line of field2 into one of several files; the file this gets pushed into is determined line-by-line by the value in field4. There are 25 different possible values in field2 and hence 25 different files the data can get parsed into. I have tried using Perl (slow) and awk (faster but still slow) - does anyone have any suggestions or pointers toward alternative approaches? FYI here is the awk code I was trying to use; note I had to revert to going through the large file 25 times because I wasn't able to keep 25 files open at once in awk: chromosomes=(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25) for chr in ${chromosomes[@]} do awk < my_in_file_here -v pat="$chr" '{if ($4 == pat) for (i = $2; i <= $2+52; i++) print i}' >> my_out_file_"$chr".query done

Read the article

Five stars of open data - example and review

- by Joe

(there may be a more suited SE site for this question so feel free to shift) I have some data I'd like to make open to the public - It's synatesis of some related data retrived from freedom of infomation requests over the last year. The data itself is at http://www.cs.rhul.ac.uk/home/joseph/domesday/Domesday-Scotland.csv or for fans of Excel, at http://www.cs.rhul.ac.uk/home/joseph/domesday/Domesday-Scotland.xlsx . It's no more than a table with about five columns. I'd like to make this properly open data, so I was looking at the 5 star deployment scheme for Open Data. Much of which is fine but I'm confused towards the end and I could do with an explenation from people who know the answers. So to get achieve the star levels I need: "make your stuff available on the Web (whatever format) under an open license" trival - all I have to do is put the notes up on the page that will give the provance of the data. "make it available as structured data (e.g., Excel instead of image scan of a table)"… done… "use non-proprietary formats (e.g., CSV instead of Excel)" - done… "use URIs to identify things, so that people can point at your stuff" - this is where I start to get a bit hazy - does this mean there should be an URI for every line in the table? "link your data to other data to provide context" - this isn't massively clear to me - does this mean to give the provence of the data? One column of the data I've put out is a link to where the data came from - is that the sort of thing we're looking at? Any and all information and answers welcome… EDIT - or if anyone wants to recommend a place SE or other place to ask the question - that would be cool...

Read the article

Using a "white list" for extracting terms for Text Mining, Part 2

- by [email protected]

In my last post, we set the groundwork for extracting specific tokens from a white list using a CTXRULE index. In this post, we will populate a table with the extracted tokens and produce a case table suitable for clustering with Oracle Data Mining. Our corpus of documents will be stored in a database table that is defined as create table documents(id NUMBER, text VARCHAR2(4000)); However, any suitable Oracle Text-accepted data type can be used for the text. We then create a table to contain the extracted tokens. The id column contains the unique identifier (or case id) of the document. The token column contains the extracted token. Note that a given document many have many tokens, so there will be one row per token for a given document. create table extracted_tokens (id NUMBER, token VARCHAR2(4000)); The next step is to iterate over the documents and extract the matching tokens using the index and insert them into our token table. We use the MATCHES function for matching the query_string from my_thesaurus_rules with the text. DECLARE cursor c2 is select id, text from documents; BEGIN for r_c2 in c2 loop insert into extracted_tokens select r_c2.id id, main_term token from my_thesaurus_rules where matches(query_string, r_c2.text)>0; end loop; END; Now that we have the tokens, we can compute the term frequency - inverse document frequency (TF-IDF) for each token of each document. create table extracted_tokens_tfidf as with num_docs as (select count(distinct id) doc_cnt from extracted_tokens), tf as (select a.id, a.token, a.token_cnt/b.num_tokens token_freq from (select id, token, count(*) token_cnt from extracted_tokens group by id, token) a, (select id, count(*) num_tokens from extracted_tokens group by id) b where a.id=b.id), doc_freq as (select token, count(*) overall_token_cnt from extracted_tokens group by token) select tf.id, tf.token, token_freq * ln(doc_cnt/df.overall_token_cnt) tf_idf from num_docs, tf, doc_freq df where df.token=tf.token; From the WITH clause, the num_docs query simply counts the number of documents in the corpus. The tf query computes the term (token) frequency by computing the number of times each token appears in a document and divides that by the number of tokens found in the document. The doc_req query counts the number of times the token appears overall in the corpus. In the SELECT clause, we compute the tf_idf. Next, we create the nested table required to produce one record per case, where a case corresponds to an individual document. Here, we COLLECT all the tokens for a given document into the nested column extracted_tokens_tfidf_1. CREATE TABLE extracted_tokens_tfidf_nt NESTED TABLE extracted_tokens_tfidf_1 STORE AS extracted_tokens_tfidf_tab AS select id, cast(collect(DM_NESTED_NUMERICAL(token,tf_idf)) as DM_NESTED_NUMERICALS) extracted_tokens_tfidf_1 from extracted_tokens_tfidf group by id; To build the clustering model, we create a settings table and then insert the various settings. Most notable are the number of clusters (20), using cosine distance which is better for text, turning off auto data preparation since the values are ready for mining, the number of iterations (20) to get a better model, and the split criterion of size for clusters that are roughly balanced in number of cases assigned. CREATE TABLE km_settings (setting_name VARCHAR2(30), setting_value VARCHAR2(30)); BEGIN INSERT INTO km_settings (setting_name, setting_value) VALUES VALUES (dbms_data_mining.clus_num_clusters, 20); INSERT INTO km_settings (setting_name, setting_value) VALUES (dbms_data_mining.kmns_distance, dbms_data_mining.kmns_cosine); INSERT INTO km_settings (setting_name, setting_value) VALUES VALUES (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_off); INSERT INTO km_settings (setting_name, setting_value) VALUES VALUES (dbms_data_mining.kmns_iterations,20); INSERT INTO km_settings (setting_name, setting_value) VALUES VALUES (dbms_data_mining.kmns_split_criterion,dbms_data_mining.kmns_size); COMMIT; END; With this in place, we can now build the clustering model. BEGIN DBMS_DATA_MINING.CREATE_MODEL( model_name => 'TEXT_CLUSTERING_MODEL', mining_function => dbms_data_mining.clustering, data_table_name => 'extracted_tokens_tfidf_nt', case_id_column_name => 'id', settings_table_name => 'km_settings'); END;To generate cluster names from this model, check out my earlier post on that topic.

Read the article

gcc/g++: error when compiling large file

- by Alexander

Hi, I have a auto-generated C++ source file, around 40 MB in size. It largely consists of push_back commands for some vectors and string constants that shall be pushed. When I try to compile this file, g++ exits and says that it couldn't reserve enough virtual memory (around 3 GB). Googling this problem, I found that using the command line switches --param ggc-min-expand=0 --param ggc-min-heapsize=4096 may solve the problem. They, however, only seem to work when optimization is turned on. 1) Is this really the solution that I am looking for? 2) Or is there a faster, better (compiling takes ages with these options acitvated) way to do this? Best wishes, Alexander Update: Thanks for all the good ideas. I tried most of them. Using an array instead of several push_back() operations reduced memory usage, but as the file that I was trying to compile was so big, it still crashed, only later. In a way, this behaviour is really interesting, as there is not much to optimize in such a setting -- what does the GCC do behind the scenes that costs so much memory? (I compiled with deactivating all optimizations as well and got the same results) The solution that I switched to now is reading in the original data from a binary object file that I created from the original file using objcopy. This is what I originally did not want to do, because creating the data structures in a higher-level language (in this case Perl) was more convenient than having to do this in C++. However, getting this running under Win32 was more complicated than expected. objcopy seems to generate files in the ELF format, and it seems that some of the problems I had disappeared when I manually set the output format to pe-i386. The symbols in the object file are by standard named after the file name, e.g. converting the file inbuilt_training_data.bin would result in these two symbols: binary_inbuilt_training_data_bin_start and binary_inbuilt_training_data_bin_end. I found some tutorials on the web which claim that these symbols should be declared as extern char _binary_inbuilt_training_data_bin_start;, but this does not seem to be right -- only extern char binary_inbuilt_training_data_bin_start; worked for me.

Read the article

Building vs. Buying a Master Data Management Solution

- by david.butler(at)oracle.com

Many organizations prefer to build their own MDM solutions. The argument is that they know their data quality issues and their data better than anyone. Plus a focused solution will cost less in the long run then a vendor supplied general purpose product. This is not unreasonable if you think of MDM as a point solution for a particular data quality problem. But this approach carries significant risk. We now know that organizations achieve significant competitive advantages when they deploy MDM as a strategic enterprise wide solution: with the most common best practice being to deploy a tactical MDM solution and grow it into a full information architecture. A build your own approach most certainly will not scale to a larger architecture unless it is done correctly with the larger solution in mind. It is possible to build a home grown point MDM solution in such a way that it will dovetail into broader MDM architectures. A very good place to start is to use the same basic technologies that Oracle uses to build its own MDM solutions. Start with the Oracle 11g database to create a flexible, extensible and open data model to hold the master data and all needed attributes. The Oracle database is the most flexible, highly available and scalable database system on the market. With its Real Application Clusters (RAC) it can even support the mixed OLTP and BI workloads that represent typical MDM data access profiles. Use Oracle Data Integration (ODI) for batch data movement between applications, MDM data stores, and the BI layer. Use Oracle Golden Gate for more real-time data movement. Use Oracle's SOA Suite for application integration with its: BPEL Process Manager to orchestrate MDM connections to business processes; Identity Management for managing users; WS Manager for managing web services; Business Intelligence Enterprise Edition for analytics; and JDeveloper for creating or extending the MDM management application. Oracle utilizes these technologies to build its MDM Hubs. Customers who build their own MDM solution using these components will easily migrate to Oracle provided MDM solutions when the home grown solution runs out of gas. But, even with a full stack of open flexible MDM technologies, creating a robust MDM application can be a daunting task. For example, a basic MDM solution will need: a set of data access methods that support master data as a service as well as direct real time access as well as batch loads and extracts; a data migration service for initial loads and periodic updates; a metadata management capability for items such as business entity matrixed relationships and hierarchies; a source system management capability to fully cross-reference business objects and to satisfy seemingly conflicting data ownership requirements; a data quality function that can find and eliminate duplicate data while insuring correct data attribute survivorship; a set of data quality functions that can manage structured and unstructured data; a data quality interface to assist with preventing new errors from entering the system even when data entry is outside the MDM application itself; a continuing data cleansing function to keep the data up to date; an internal triggering mechanism to create and deploy change information to all connected systems; a comprehensive role based data security system to control and monitor data access, update rights, and maintain change history; a flexible business rules engine for managing master data processes such as privacy and data movement; a user interface to support casual users and data stewards; a business intelligence structure to support profiling, compliance, and business performance indicators; and an analytical foundation for directly analyzing master data. Oracle's pre-built MDM Hub solutions are full-featured 3-tier Internet applications designed to participate in the full Oracle technology stack or to run independently in other open IT SOA environments. Building MDM solutions from scratch can take years. Oracle's pre-built MDM solutions can bring quality data to the enterprise in a matter of months. But if you must build, at lease build with the world's best technology stack in a way that simplifies the eventual upgrade to Oracle MDM and to the full enterprise wide information architecture that it enables.

Read the article

Big Data – ClustrixDB – Extreme Scale SQL Database with Real-time Analytics, Releases Software Download – NewSQL

- by Pinal Dave

There are so many things to learn and there is so little time we all have. As we have little time we need to be selective to learn whatever we learn. I believe I know quite a lot of things in SQL but I still do not know what is around SQL. I have started to learn about NewSQL recently. If you wonder what is NewSQL I encourage all of you to read my blog post about NewSQL over here Big Data – Buzz Words: What is NewSQL – Day 10 of 21. NewSQL databases are quickly becoming popular – providing the scale of NoSQL with the SQL features and transactions. As a part of learning NewSQL database, I have recently started to learn about ClustrixDB. ClustrixDB has been the most mature NewSQL database used by some of the largest internet sites in the world for over 3 years, with extensive SQL support. In addition to scale, it provides fast real-time analytics by bringing massively parallel processing (MPP), available only in warehousing databases, to the transactional database. The reason I am more intrigued about learning ClustrixDB is their recent announcement on Oct 31. ClustrixDB was only available as an appliance, but now with their software release on Oct 31, everyone can use it. It is now available as forever free for up to 12 cores with community support, and there is a 45 day trial for unlimited cluster sizes. With the forever free world, I am indeed interested in ClustrixDB now. I know that few of the leading eCommerce sites in the world uses them for their transactional database. Here are few of the details I have quickly noted for ClustrixDB. ClustrixDB allows user to: Scale by simply adding nodes to the cluster with a single command Run billions of transactions a day Run fast real-time analytics Achieve high-availability with recovery from node failure Manages itself Easily migrate from MySQL as it is nearly plug-and-play compatible, use MySQL drivers, tools and replication. While I was going through the documentation I realized that ClustrixDB also has extensive support for SQL features including complex queries involving joins on a dozen or more tables, aggregates, sorts, sub-queries. It also supports stored procedures, triggers, foreign keys, partitioned and temporary tables, and fully online schema changes. It is indeed a very matured product and SQL solution. Indeed Clusterix sound very promising solution, I decided to dig a bit deeper to understand who are current customers of the Clustrix as they exist in the industry for quite a few years. Their client list is indeed very interesting and here is my quick research about them. Twoo.com – Europe’s largest social discovery (dating) site runs 4.4 Billion Transactions a day with table sizes over a Terabyte, on a 168 core cluster. EngageBDR – Top 3 in the online advertising category uses ClustrixDB to serve 6.9 billion ads a day through real-time bidding platform. Their reports went from 4 hours to 15 seconds. NoMoreRack – Top 2 fastest growing e-commerce company in US used ClustrixDB for high availability and fast growth through Amazon cloud. MakeMyTrip – India’s leading travel site runs on ClustrixDB with two clusters running as multi-master in Chennai and Bangalore. Many enterprises such as AOL, CSC, Rakuten, Symantec use ClustrixDB when their applications need scale. I must accept that I am impressed with the information I have learned so far and now is the time to do some hand’s on experience with their product. I want to learn this technology so in future when it is about NewSQL, I know what I am talking about. Read more why Clustrix explains why you ClustrixDB might be the right database for you. Download ClustrixDB with me today and install it on your machine so in future when we discuss the technical aspects of it, we all are on the same page. The software can be downloaded here. Reference : Pinal Dave (http://blog.SQLAuthority.com)Filed under: Big Data, MySQL, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: Clustrix

Read the article

All Xen domU LVM volumes corrupt after reboot

- by zcs

I'm running a Debian Squeeze dom0, and after rebooting it all 7 of my domUs have data corruption. Each is setup as ext3 partition directly on a separate lvm2 volume. None of the lvm volumes will mount; all have bad superblocks. I've tried e2fsck with each superblock to no avail. What else can I try? Each domU has two LVM volumes connected to it, one for the disk and one for swap. The disk is mounted at root, formatted as a normal ext3 partition as a xen-blk device. The volumes are never mounted outside of the guest OS. I'm running Ubuntu 11.04 using the instructions here. I'm not sure that they didn't shutdown properly, all I know is they were corrupt after I issues a clean 'reboot' on the dom0. Here's a sample Xen config file; the rest are the same except for name, vcpus, memory, vif and disk. name = 'load1' vcpus = 2 memory = 512 vif = ['bridge=prbr0', 'bridge=eth0'] disk = ['phy:/dev/VolGroup00/load1-disk,xvda,w','phy:/dev/VolGroup00/load1-swap,xvdb,w'] #============================================================================ # Debian Installer specific variables def check_bool(name, value): value = str(value).lower() if value in ('t', 'tr', 'tru', 'true'): return True return False global var_check_with_default def var_check_with_default(default, var, val): if val: return val return default xm_vars.var('install', use='Install Debian, default: false', check=check_bool) xm_vars.var("install-method", use='Installation method to use "cdrom" or "network" (default: network)', check=lambda var, val: var_check_with_default('network', var, val)) # install-method == "network" xm_vars.var("install-mirror", use='Debian mirror to install from (default: http://archive.ubuntu.com/ubuntu)', check=lambda var, val: var_check_with_default('http://archive.ubuntu.com/ubuntu', var, val)) xm_vars.var("install-suite", use='Debian suite to install (default: natty)', check=lambda var, val: var_check_with_default('natty', var, val)) # install-method == "cdrom" xm_vars.var("install-media", use='Installation media to use (default: None)', check=lambda var, val: var_check_with_default(None, var, val)) xm_vars.var("install-cdrom-device", use='Installation media to use (default: xvdd)', check=lambda var, val: var_check_with_default('xvdd', var, val)) # Common options xm_vars.var("install-arch", use='Debian mirror to install from (default: amd64)', check=lambda var, val: var_check_with_default('amd64', var, val)) xm_vars.var("install-extra", use='Extra command line options (default: None)', check=lambda var, val: var_check_with_default(None, var, val)) xm_vars.var("install-installer", use='Debian installer to use (default: network uses install-mirror; cdrom uses /install.ARCH)', check=lambda var, val: var_check_with_default(None, var, val)) xm_vars.var("install-kernel", use='Debian installer kernel to use (default: uses install-installer)', check=lambda var, val: var_check_with_default(None, var, val)) xm_vars.var("install-ramdisk", use='Debian installer ramdisk to use (default: uses install-installer)', check=lambda var, val: var_check_with_default(None, var, val)) xm_vars.check() if not xm_vars.env.get('install'): bootloader="/usr/sbin/pygrub" elif xm_vars.env['install-method'] == "network": import os.path print "Install Mirror: %s" % xm_vars.env['install-mirror'] print "Install Suite: %s" % xm_vars.env['install-suite'] if xm_vars.env['install-installer']: installer = xm_vars.env['install-installer'] else: installer = xm_vars.env['install-mirror']+"/dists/"+xm_vars.env['install-suite'] + \ "/main/installer-"+xm_vars.env['install-arch']+"/current/images" print "Installer: %s" % installer print print "WARNING: Installer kernel and ramdisk are not authenticated." print if xm_vars.env.get('install-kernel'): kernelurl = xm_vars.env['install-kernel'] else: kernelurl = installer + "/netboot/xen/vmlinuz" if xm_vars.env.get('install-ramdisk'): ramdiskurl = xm_vars.env['install-ramdisk'] else: ramdiskurl = installer + "/netboot/xen/initrd.gz" import urllib class MyUrlOpener(urllib.FancyURLopener): def http_error_default(self, req, fp, code, msg, hdrs): raise IOError("%s %s" % (code, msg)) urlopener = MyUrlOpener() try: print "Fetching %s" % kernelurl kernel, _ = urlopener.retrieve(kernelurl) print "Fetching %s" % ramdiskurl ramdisk, _ = urlopener.retrieve(ramdiskurl) except IOError, _: raise elif xm_vars.env['install-method'] == "cdrom": arch_path = { 'i386': "/install.386", 'amd64': "/install.amd" } if xm_vars.env['install-media']: print "Install Media: %s" % xm_vars.env['install-media'] else: raise OptionError("No installation media given.") if xm_vars.env['install-installer']: installer = xm_vars.env['install-installer'] else: installer = arch_path[xm_vars.env['install-arch']] print "Installer: %s" % installer if xm_vars.env.get('install-kernel'): kernelpath = xm_vars.env['install-kernel'] else: kernelpath = installer + "/xen/vmlinuz" if xm_vars.env.get('install-ramdisk'): ramdiskpath = xm_vars.env['install-ramdisk'] else: ramdiskpath = installer + "/xen/initrd.gz" disk.insert(0, 'file:%s,%s:cdrom,r' % (xm_vars.env['install-media'], xm_vars.env['install-cdrom-device'])) bootloader="/usr/sbin/pygrub" bootargs="--kernel=%s --ramdisk=%s" % (kernelpath, ramdiskpath) print "From CD" else: print "WARNING: Unknown install-method: %s." % xm_vars.env['install-method'] if xm_vars.env.get('install'): # Figure out command line if xm_vars.env['install-extra']: extras=[xm_vars.env['install-extra']] else: extras=[] # Reboot will just restart the installer since this file is not # reparsed, so halt and restart that way. extras.append("debian-installer/exit/always_halt=true") extras.append("--") extras.append("quiet") console="hvc0" try: if len(vfb) >= 1: console="tty0" except NameError, e: pass extras.append("console="+ console) extra = str.join(" ", extras) print "command line is \"%s\"" % extra root There are two LVM logical volumes connected to each VM. Here's the fdisk -l output for the disk volume: Disk /dev/VolGroup00/VMNAME-disk: 8589 MB, 8589934592 bytes 255 heads, 63 sectors/track, 1044 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00029c01 Device Boot Start End Blocks Id System /dev/VolGroup00/VMNAME-disk1 1 1045 8386560 83 Linux And the swap volume: Disk /dev/VolGroup00/VMNAME-swap: 536 MB, 536870912 bytes 37 heads, 35 sectors/track, 809 cylinders Units = cylinders of 1295 * 512 = 663040 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0004faae Device Boot Start End Blocks Id System /dev/VolGroup00/VMNAME-swap1 2 809 522240 82 Linux swap / Solaris Partition 1 has different physical/logical beginnings (non-Linux?): phys=(0, 32, 33) logical=(1, 21, 19) Partition 1 has different physical/logical endings: phys=(65, 36, 35) logical=(808, 4, 28)

Read the article

Database modularity with EBS volumes

- by Eclyps19

I would like to add modularity to my websites on EC2 instances by encapsulating the site files and the mysql files in their own EBS volumes. The end result that I'm going for is the ability to quickly mount a volume or two to different servers running the same AMI (for testing/development/emergency maintenance, etc), as well as maintain separate snapshots of each. I'm able to do this fairly easily with a single database by symlinking my mounted database EBS to the appropriate places (/var/lib/mysql, /etc/my.cnf, /var/log/mysqld.log), but I'm not sure if it would even be possible be possible to have multiple databases on different EBS volumes running concurrently. Example: /website1/www.website.com /database1/ /website2/www.otherwebsite.com /database2/ Could anybody shed some light on this for me? Is it possible? Is it a bad idea? Thanks.

Read the article

Detaching EBS Volumes (in LVM) take a lot of time

- by Cheezo

I have an EC2 Instance(EBS Backed-root partition) with EBS volumes configured via LVM. I have formatted it as ext4 and can mount it to store data etc. Now i want take a snapshot of the root partition, hence in that case i go and detach the other non-root EBS volumes (configured in LVM). Here a regular detach does not work, and i have "force" detach almost always. Although, i another similar setup with RAID instead of LVM and there after stopping RAID, i can easily detach. The whole setup is running Ubuntu Maverick 10.10 Please assist me in the same.

Read the article

Oracle Communications Data Model

- by jean-pierre.dijcks

I've mentioned OCDM in previous posts but found the following (see end of the post) podcast on the topic and figured it is worthwhile to spread the news some more. ORetailDM and OCommunicationsDM are the two data models currently available from Oracle. Both are intended to capture: Business best practices and industry knowledge Pre-built advanced analytics intended to predict future events before they happen (like the Churn model shown below) Oracle technology best practices to ensure optimal performance of the model All of this typically comes with a reduced time to implementation, or as the marketing slogan goes, reduced time to value. Here are the links: Podcast on OCDM OTN pages for OCDM and ORDM

Read the article

Importing data from text file to specific columns using BULK INSERT

- by Dinesh Asanka

Bulk insert is much faster than using other techniques such as SSIS. However, when you are using bulk insert you can’t insert to specific columns. If, for example, there are five columns in a table you should have five values for each record in the text file you are importing from. This is an issue when you are expecting default values to be inserted into tables. Let us say you have table as below: In this table, you are expecting ID, Status and CreatedDate to be updated automatically, so your text file may only have FirstName LastName values as below: Dinesh,Asanka Saman,Liyanage Ruwan,Silva Susantha,Bathige Jude,Peires Sanjeewa,Jayawickrama If you use bulk insert to this table like follows, You will be returned an error: Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (ID). To avoid this you will need to create a view with the columns you are expecting to fill and use bulk insert against it. If you check the table now, you will see table with values in the text file and the default values.

Read the article

Javascript: Safely upload a client data file

- by Jeffrey Sweeney

I'm (still) working on a template-based XML editing program. It's a GUI-based XML editor that only allows users to add certain tags and attributes based off the requirements. You can see the current version here for an idea. Now, I'd like to allow users to upload their own data templates, but I'm concerned about potential XSS hacks. Currently, the template file is in Javascript object literal notation, which unsurprisingly is a security nightmare if the user can upload their own. I was thinking of using XML instead, but is there an even better alternative?

Read the article

Reuse the data CRUD methods in data access layer, but they are updated too quickly

- by ValidfroM

I agree that we should put CRUD methods in a data access layer, However, in my current project I have some issues. It is a legacy system, and there are quite a lot CRUD methods in some concrete manager classes. People including me seem to just add new methods to it, rather than reuse the existing methods. Because We don't know whether the existing method is what we need Even if we have source code, do we really need read other's code then make decision? It is updated too quickly. Do not have time get familiar with the DAO API. Back to the question, how do you solve that in your project? If we say "reuse", it really needs to be reusable rather than just an excuse.

Read the article

Get aggregated view of data for entire website with Google Analytics

- by crmpicco

I have a website (www.ayrshireminis.com), which has three main sections under different directories, these are: /forum /galleries /contact I would like to have an aggregated view of the data for the whole website, but also for each section. What is the recommended approach for doing this? I believe I can create a web property that includes a profile for the entire website and duplicated filtered profiles, each section having an include filter. This is my gut instinct, but i'd like to know if there is another (better) way to do it? Maybe by having one account that includes a profile for the whole site and another profile with an include filter for the individual sections?

Read the article

Still no detected structured data in Google Webmaster Tools [on hold]

- by user6211

Can you give me some suggestions what's wrong with my structured data? Google still cannot read it. It looks like this: <div class="identity"> <div itemscope itemtype="http://schema.org/LocalBusiness"> <a itemprop="url" href="http://MYDOMAIN.co.uk/"><div itemprop="name"><strong>MY_COMPANY</strong></div></a> <div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress"> <span itemprop="streetAddress">MY_ADDRESS</span>, <span itemprop="addressLocality">London</span>, <span itemprop="postalCode">SE5 MY_XYZ</span>, <span itemprop="addressCountry">UK</span> </div> </div> </div>

Read the article

Design: How to model / where to store relational data between classes

- by Walker

I'm trying to figure out the best design here, and I can see multiple approaches, but none that seems "right." There are three relevant classes here: Base, TradingPost, and Resource. Each Base has a TradingPost which can offer various Resources depending on the Base's tech level. Where is the right place to store the minimum tech level a base must possess to offer any given resource? A database seems like overkill. Putting it in each subclass of Resource seems wrong--that's not an intrinsic property of the Resource. Do I have a mediating class, and if so, how does it work? It's important that I not be duplicating code; that I have one place where I set the required tech level for a given item. Essentially, where does this data belong? P.S. Feel free to change the title; I struggled to come up with one that fits.

Search Results

Search found 65999 results on 2640 pages for 'large data volumes'.

Page 19/2640 | < Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26 | Next Page >

- by NewAlexandria

- by Adam Machanic

- by nijhawan.saurabh

- by RahulS

- by Ameya

- by sagacious

- by James Testa

- by pi

- by Andrew

- by Joe

- by [email protected]

- by Alexander

- by david.butler(at)oracle.com

- by Pinal Dave

- by zcs

- by Eclyps19

- by Cheezo

- by jean-pierre.dijcks

- by Dinesh Asanka

- by Jeffrey Sweeney

- by ValidfroM

- by crmpicco

- by user6211

- by Walker

< Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26 | Next Page >