Search Results

Search found 58823 results on 2353 pages for 'data profiling'.

Page 92/2353 | < Previous Page | 88 89 90 91 92 93 94 95 96 97 98 99 | Next Page >

Adding line with text between pattern and next occurence of the same pattern in bash

- by kasper

I am writing a bash script that modifies a file that looks like this: --- usr1 --- data data data data data data data data data data data data --- usr2 --- data data data data data data data data --- usr3 --- data data data data --- endline --- One question is: How to add next user line --- usrn --- after last user data lines? Second one is: How to delete specific user data lines (data lines and --- userx ---) i.e. I would like to delete usr2 with all his data set. It must work on bash 2.05 :) and I think it will use awk or sed, but I'm not sure.

Read the article
Access cost of dynamically created objects with dynamically allocated members

- by user343547

I'm building an application which will have dynamic allocated objects of type A each with a dynamically allocated member (v) similar to the below class class A { int a; int b; int* v; }; where: The memory for v will be allocated in the constructor. v will be allocated once when an object of type A is created and will never need to be resized. The size of v will vary across all instances of A. The application will potentially have a huge number of such objects and mostly need to stream a large number of these objects through the CPU but only need to perform very simple computations on the members variables. Could having v dynamically allocated could mean that an instance of A and its member v are not located together in memory? What tools and techniques can be used to test if this fragmentation is a performance bottleneck? If such fragmentation is a performance issue, are there any techniques that could allow A and v to allocated in a continuous region of memory? Or are there any techniques to aid memory access such as pre-fetching scheme? for example get an object of type A operate on the other member variables whilst pre-fetching v. If the size of v or an acceptable maximum size could be known at compile time would replacing v with a fixed sized array like int v[max_length] lead to better performance? The target platforms are standard desktop machines with x86/AMD64 processors, Windows or Linux OSes and compiled using either GCC or MSVC compilers.

Read the article
Analysing a JVisualVM profile - finding the source of large numbers of primitive types?

- by MalcomTucker

I am trying to reduce the memory footprint of my application. JVisualVM heap dumps report that the objects taking up the most space are: char[] byte[] int[] Which isn't particularly helpful. How can I track these objects back to the parent classes that are holding them? Thanks

Read the article
Resources to learn about engineering aspects of data analytics (OLAP, warehousing, ETL, etc.)

- by JT

I'm a math/stats guy, interested in learning more about the engineering aspects of "data analytics" (this may be an overly broad term, this is a case of "I don't know what I don't know", so I'm not sure how to be more specific). I'm fine with manipulating and analyzing the data once it's already stored somewhere and I can access it, and I'm fine with writing scripts and SQL queries (and have a general knowledge of things like normalization). What I don't know is the whole engineering process of capturing and storing the data. For example, terms I've heard thrown about that I only vaguely understand the meaning of include: - OLAP, OLTP - Data warehousing - ETL - ??? What's a good book (or any other resource) to learn about these kinds of things? What are things I should know about database design (normalization seems kinda "obvious" to me, something I would have done even before I knew the term -- is there anything else?)? In other words, for jobs falling under the umbrella term of "analytics engineer", what kinds of things should I know?

Read the article
Designing DAL in .NET to be "data-source independent" and not just "database independent" ?

- by Munish Goyal

How to design such flexible DAL (specifically in .NET) ? What interfaces .NET provides and what should be done on my own ? Its a greenfield project starting with SQL Server as data source but in future, parts of it will move to different NoSQL type of datastores. Also, we may need to experiment with lot of different datastores (like some data may have to go with Cassandra, some with RDBMS, some to other DHT etc.) Therefore easily switchable access layer will be needed. All i know right now is the 'data' and 'operations needed on that data'.

Read the article
add properties to users google app engine

- by juanefren

What is the best way to save a user profile with Google App Engine (Python) ? What I did to solve this problem is create another Model, with a UserProperty, but requesting the profile from the user I have to do something like this: if user: profile = Profile.all().filter('user =', user).fetch(1) if profile: property = s.get().property Any ideas?

Read the article
Is there a lightweight multipart/form-data parser in C or C++?

- by Hongli

I'm looking at integrating multipart form-data parsing in a web server module so that I can relieve backend web applications (often written in dynamic languages) from parsing the multipart data themselves. The multipart grammar (RFC 2046) looks non-trivial and if I implement it by hand a lot of things can go wrong. Is there already a good, lightweight multipart/form-data parser written in C or C++? I'm looking for one with no external dependencies other than the C or C++ standard library. I don't need email attachment handling or buffered I/O classes or a portability runtime or whatever, just multipart/form-data parsing. Things that I've considered: GMime - depends on glib, so no go. libapreq - too large, depends on APR, badly documented, no unit tests. I've also looked at writing a parser with Ragel, but I can't figure out how to do it because the grammar is not static: the boundary can change arbitrarily.

Read the article
Generate and merge data with python multiprocessing

- by Bobby

I have a list of starting data. I want to apply a function to the starting data that creates a few pieces of new data for each element in the starting data. Some pieces of the new data are the same and I want to remove them. The sequential version is essentially: def create_new_data_for(datum): """make a list of new data from some old datum""" return [datum.modified_copy(k) for k in datum.k_list] data = [some list of data] #some data to start with #generate a list of new data from the old data, we'll reduce it next newdata = [] for d in data: newdata.extend(create_new_data_for(d)) #now reduce the data under ".matches(other)" reduced = [] for d in newdata: for seen in reduced: if d.matches(seen): break #so we haven't seen anything like d yet seen.append(d) #now reduced is finished and is what we want! I want to speed this up with multiprocessing. I was thinking that I could use a multiprocessing.Queue for the generation. Each process would just put the stuff it creates on, and when the processes are reducing the data, they can just get the data from the Queue. But I'm not sure how to have the different process loop over reduced and modify it without any race conditions or other issues. What is the best way to do this safely? or is there a different way to accomplish this goal better?

Read the article
How to read a file with variable multi-row data in Python

- by dr.bunsen

I have a file that is about 100Mb that looks like this: #meta data 1 skadjflaskdjfasljdfalskdjfl sdkfjhasdlkgjhsdlkjghlaskdj asdhfk #meta data 2 jflaksdjflaksjdflkjasdlfjas ldaksjflkdsajlkdfj #meta data 3 alsdkjflasdjkfglalaskdjf This file contains one row of meta data that corresponds to several, variable length data containing only alpha-numeric characters. What is the best way to read this data into a simple list like this: data = [[#meta data 1, skadjflaskdjfasljdfalskdjflsdkfjhasdlkgjhsdlkjghlaskdjasdhfk], [#meta data 2, jflaksdjflaksjdflkjasdlfjasldaksjflkdsajlkdfj], [#meta data 3, alsdkjflasdjkfglalaskdjf]] My initial idea was to use the read() method to read the whole file into memory and then use regular expressions to parse the data into the desired format. Is there a better more pythonic way? All metadata lines start with an octothorpe and all data lines are all alpha-numeric. Thanks!

Read the article
How to monitor MySQL query errors, timeouts and logon attempts?

- by Abel

While setting up a third party closed source CMS (Sitefinity) the setup doesn't create all tables and procedures necessary to run it. The software lacks a logging system itself and it made me wonder: could I trace and monitor failing SQL statements from MySQL? This serves more than only the purpose of solving my issue with Sitefinity. More often I wonder what's send to the MySQL server, not wanting to dive into the software products or setup a debugging environment etc. I tried JetProfiler (only performance) and looked through a few others, but although they monitor a lot, they don't monitor query failures, timeouts or logon attempts. Does anyone know a profiler, tracer, monitoring tool, commercial or free, that can show me this information?

Read the article
Using XCode and instruments to improve iPhone app performance

- by MrDatabase

I've been experimenting with Instruments off and on for a while and and I still can't do the following (with any sensible results): determine or estimate the average runtime of a function that's called many times. For example if I'm driving my gameLoop at 60 Hz with a CADisplayLink I'd like to see how long the loop takes to run on average... 10 ms? 30 ms etc. I've come close with the "CPU activity" instrument but the results are inconsistent or don't make sense. The time profiler seems promising but all I can get is "% of runtime"... and I'd like an actual runtime.

Read the article
Refactoring - Speed increase

- by Michael G

How can I make this function more efficient. It's currently running at 6 - 45 seconds. I've ran dotTrace profiler on this specific method, and it's total time is anywhere between 6,000ms to 45,000ms. The majority of the time is spent on the "MoveNext" and "GetEnumerator" calls. and example of the times are 71.55% CreateTableFromReportDataColumns - 18, 533* ms - 190 calls -- 55.71% MoveNext - 14,422ms - 10,775 calls What can I do to speed this method up? it gets called a lot, and the seconds add up: private static DataTable CreateTableFromReportDataColumns(Report report) { DataTable table = new DataTable(); HashSet<String> colsToAdd = new HashSet<String> { "DataStream" }; foreach (ReportData reportData in report.ReportDatas) { IEnumerable<string> cols = reportData.ReportDataColumns.Where(c => !String.IsNullOrEmpty(c.Name)).Select(x => x.Name).Distinct(); foreach (var s in cols) { if (!String.IsNullOrEmpty(s)) colsToAdd.Add(s); } } foreach (string col in colsToAdd) { table.Columns.Add(col); } return table; }

Read the article
How to process large block data visualization with Flex?

- by hydra1983

I know that's a big topic. However, it's better to know some general ideas to handle such problems. I have an application which requires Flex to render statistics data calculated instantly on the client side from a downloaded data set. The problems are: the data set is large and needs more than 10 seconds to be downloaded. there are some filters to control the statistics calculation algorithms. If user changes the filters, it would take a long time to recalculate the result and freeze the UI.

Read the article
How Can I Log and Find the Most Expensive Queries?

- by Pure.Krome

Hi folks The activity monitor in sql2k8 allows us to see the most expensive queries. Ok, that's kewl, but is there a way I can log this info or get this info via query analyser? I don't really want to have the Sql Management console open and me looking at the activity monitor dashboard. I want to figure out which queries are poorly written/schema is poorly designed, etc. Thanks heaps for any help!

Read the article
How can I see symbols of (C and C++) binary on linux?

- by Andrei

Which tools do you guys use? How do demangle c++ symbols do be able to pass it to profiler tools, such as opannotate? Thanks

Read the article
oprofile unable to produce call graph

- by aaa

hello I am trying to use oprofile to generate call graph. Compiler is g++, platform is linux x86-64, linker is gfortran C++ code is compiled with -fno- omit-frame-pointer. oprofile is started with --callgraph=25. report I run with --callgraph. the call graph is produced but it's only includes self time, which is not much use what am I missing?

Read the article
html5 uploader + jquery drag & drop: how to store file data with FormData?

- by lauthiamkok

I am making a html5 drag and drop uploader with jquery, below is my code so far, the problem is that I get an empty array without any data. Is this line incorrect to store the file data - fd.append('file', $thisfile);? $('#div').on( 'dragover', function(e) { e.preventDefault(); e.stopPropagation(); } ); $('#div').on( 'dragenter', function(e) { e.preventDefault(); e.stopPropagation(); } ); $('#div').on( 'drop', function(e){ if(e.originalEvent.dataTransfer){ if(e.originalEvent.dataTransfer.files.length) { e.preventDefault(); e.stopPropagation(); // The file list. var fileList = e.originalEvent.dataTransfer.files; //console.log(fileList); // Loop the ajax post. for (var i = 0; i < fileList.length; i++) { var $thisfile = fileList[i]; console.log($thisfile); // HTML5 form data object. var fd = new FormData(); //console.log(fd); fd.append('file', $thisfile); /* var file = {name: fileList[i].name, type: fileList[i].type, size:fileList[i].size}; $.each(file, function(key, value) { fd.append('file['+key+']', value); }) */ $.ajax({ url: "upload.php", type: "POST", data: fd, processData: false, contentType: false, success: function(response) { // .. do something }, error: function(jqXHR, textStatus, errorMessage) { console.log(errorMessage); // Optional } }); } /*UPLOAD FILES HERE*/ upload(e.originalEvent.dataTransfer.files); } } } ); function upload(files){ console.log('Upload '+files.length+' File(s).'); }; then if I use another method is that to make the file data into an array inside the jquery code, var file = {name: fileList[i].name, type: fileList[i].type, size:fileList[i].size}; $.each(file, function(key, value) { fd.append('file['+key+']', value); }); but where is the tmp_name data inside e.originalEvent.dataTransfer.files[i]? php, print_r($_POST); $uploaddir = './uploads/'; $file = $uploaddir . basename($_POST['file']['name']); if (move_uploaded_file($_POST['file']['tmp_name'], $file)) { echo "success"; } else { echo "error"; } as you can see that tmp_name is needed to upload the file via php... html, <div id="div">Drop here</div>

Read the article
How do I get a peak memory usage snapshot from JVisualVM?

- by MalcomTucker

I need a memory snapshot at the peak of my application's memory usage - is there an easy way to achieve this? thanks

Read the article
Compute jvm heap size to host web application

- by Enrique

Hello, I want to host a web application on a private JVM they offer 32, 64, 128, 256 MB plans. My web application uses Spring. And I store some objects for every logged in user session. My question is: How can I profile my web app to see how much heap size it needs so I can choose a plan?, How can I simulate hundreds of users logged in at the same time? I'm developing the application using Netbeans 6.7 Java 1.6 Tomcat 6.0.18 Thank you.

Read the article
Decent profiler for Windows?

- by olliej

Does windows have any decent sampling (eg. non-instrumenting) profilers available? Preferably something akin to Shark on MacOS, although i am willing to accept that i am going to have to pay for such a profiler on windows. I've tried the profiler in VS Team Suite and was not overly impressed, and was wondering if there were any other good ones. [Edit: Erk, i forgot to say this is for C/C++, rather than .NET -- sorry for any confusion]

Read the article
MySQL Cluster data nodes - slow SELECTs

- by Boyan Georgiev

Hi to all. First off, I'm new to MySQL Cluster. This is my pain: I've managed to setup a MySQL Cluster with two data nodes, two SQL nodes and one management server. Everything works pretty well, except the following: my data nodes are spread across an intranet link which incurs latency into communications between the data nodes. Apparently, due to MySQL Cluster's internal partitioning schemes, when my PHP application pulls data from the cluster via SELECT queries, parts of the data are pulled from both data nodes. This makes the page appear onscreen REALLY slowly. If I bring one data node offline, the data can only be pulled from that single remaining data node, and thus, the final result (HTML output) appears on the screen in a very timely fashion. So, my question is this: can the data nodes/cluster be told to pull data from partitions stored only on a particular data node?

Read the article
Is there any reason why someone would want to create an Core Data model programmatically?

- by mystify

I wonder in which cases it would be good to make an NSManagedObjectModel completely programmatically, with NSEntityDescription instances and all this stuff. I'm that kind of person who prefers to code programmatically, rejecting Interface Builder. But when it comes to Core Data, I have a hard time figuring out why I should kill my time NOT using the nice Xcode Data Modeler tool. And since data models are stuck to a given state (except when you want to do some ugly migration operations where thinks probably go wrong and users get mad, really mad), I see no big sense in a data model that's made programmatically for the purpose of changing it all the time. Did I miss something?

Read the article
How to structure a Visual Studio project for the data access layer

- by Akk

I currently have a project that uses various DB access technologies mainly for showcasing or for demos. Currently we have: Namespace App.Data (App.Data.dll) Folder NHibernate Folder EntityFramework Folder LinqToSql The above structure is ok as we only use Sql Server as the DB. But going forward we will be including Oracle, MySql etc. So what would be a better structure with this in mind? I thought about: Namespace App.Data.SqlServer (App.Data.SqlServer.dll) Folder NHibernate Folder EntityFramework Folder LinqToSql Or would it just be better to have separate assemblies for each database and access technology?: Namespace App.Data.SqlServer.NHibernate (App.Data.SqlServer.NHibernate.dll) Namespace App.Data.SqlServer.EntityFramework(App.Data.SqlServer.EntityFramework.dll) Namespace App.Data.Oracle.NHibernate (App.Data.Oracle.NHibernate.dll) Namespace App.Data.MySql.NHibernate (App.Data.MySql.Oracle.dll)

Read the article
Optimization of Function with Dictionary and Zip()

- by eWizardII

Hello, I have the following function: def filetxt(): word_freq = {} lvl1 = [] lvl2 = [] total_t = 0 users = 0 text = [] for l in range(0,500): # Open File if os.path.exists("C:/Twitter/json/user_" + str(l) + ".json") == True: with open("C:/Twitter/json/user_" + str(l) + ".json", "r") as f: text_f = json.load(f) users = users + 1 for i in range(len(text_f)): text.append(text_f[str(i)]['text']) total_t = total_t + 1 else: pass # Filter occ = 0 import string for i in range(len(text)): s = text[i] # Sample string a = re.findall(r'(RT)',s) b = re.findall(r'(@)',s) occ = len(a) + len(b) + occ s = s.encode('utf-8') out = s.translate(string.maketrans("",""), string.punctuation) # Create Wordlist/Dictionary word_list = text[i].lower().split(None) for word in word_list: word_freq[word] = word_freq.get(word, 0) + 1 keys = word_freq.keys() numbo = range(1,len(keys)+1) WList = ', '.join(keys) NList = str(numbo).strip('[]') WList = WList.split(", ") NList = NList.split(", ") W2N = dict(zip(WList, NList)) for k in range (0,len(word_list)): word_list[k] = W2N[word_list[k]] for i in range (0,len(word_list)-1): lvl1.append(word_list[i]) lvl2.append(word_list[i+1]) I have used the profiler to find that it seems the greatest CPU time is spent on the zip() function and the join and split parts of the code, I'm looking to see if there is any way I have overlooked that I could potentially clean up the code to make it more optimized, since the greatest lag seems to be in how I am working with the dictionaries and the zip() function. Any help would be appreciated thanks!

Read the article
mysql query timer for .net

- by acidzombie24

Is there something i can use to track how long my mysql queries take? perhaps log them if they take a certain amount of time? or track all queries but only hold the longest query time? using this with C# .NET with ASP.NET. I'd like to use this to occasionally check if my queries are getting slow.

Read the article

< Previous Page | 88 89 90 91 92 93 94 95 96 97 98 99 | Next Page >