Search Results

Search found 1226 results on 50 pages for 'improvement'.

Page 39/50 | < Previous Page | 35 36 37 38 39 40 41 42 43 44 45 46 | Next Page >

Looking for an appropriate design pattern

- by user1066015

I have a game that tracks user stats after every match, such as how far they travelled, how many times they attacked, how far they fell, etc, and my current implementations looks somewhat as follows (simplified version): Class Player{ int id; public Player(){ int id = Math.random()*100000; PlayerData.players.put(id,new PlayerData()); } public void jump(){ //Logic to make the user jump //... //call the playerManager PlayerManager.jump(this); } public void attack(Player target){ //logic to attack the player //... //call the player manager PlayerManager.attack(this,target); } } Class PlayerData{ public static HashMap<int, PlayerData> players = new HashMap<int,PlayerData>(); int id; int timesJumped; int timesAttacked; } public void incrementJumped(){ timesJumped++; } public void incrementAttacked(){ timesAttacked++; } } Class PlayerManager{ public static void jump(Player player){ players.get(player.getId()).incrementJumped(); } public void incrementAttacked(Player player, Player target){ players.get(player.getId()).incrementAttacked(); } } So I have a PlayerData class which holds all of the statistics, and brings it out of the player class because it isn't part of the player logic. Then I have PlayerManager, which would be on the server, and that controls the interactions between players (a lot of the logic that does that is excluded so I could keep this simple). I put the calls to the PlayerData class in the Manager class because sometimes you have to do certain checks between players, for instance if the attack actually hits, then you increment "attackHits". The main problem (in my opinion, correct me if I'm wrong) is that this is not very extensible. I will have to touch the PlayerData class if I want to keep track of a new stat, by adding methods and fields, and then I have to potentially add more methods to my PlayerManager, so it isn't very modulized. If there is an improvement to this that you would recommend, I would be very appreciative. Thanks.

Read the article
Thread-Safe lazy instantiating using MEF

- by Xaqron

// Member Variable private static readonly object _syncLock = new object(); // Now inside a static method foreach (var lazyObject in plugins) { if ((string)lazyObject.Metadata["key"] = "something") { lock (_syncLock) { // It seems the `IsValueCreated` is not up-to-date if (!lazyObject.IsValueCreated) lazyObject.value.DoSomething(); } return lazyObject.value; } } Here I need synchronized access per loop. There are many threads iterating this loop and based on the key they are looking for, a lazy instance is created and returned. lazyObject should not be created more that one time. Although Lazy class is for doing so and despite of the used lock, under high threading I have more than one instance created (I track this with a Interlocked.Increment on a volatile static int and log it somewhere). The problem is I don't have access to definition of Lazy and MEF defines how the Lazy class create objects. I should notice the CompositionContainer has a thread-safe option in constructor which is already used. My questions: 1) Why the lock doesn't work ? 2) Should I use an array of locks instead of one lock for performance improvement ?

Read the article
Oracle Query Optimization: Why is My Second Query Faster?

- by Patrick Cuff

I was having some performance issues with an Oracle query, so I downloaded a trial of the Quest SQL Optimizer for Oracle, which made some changes that dramatically improved the query's performance. I'm not exactly sure why the recommended query had such an improvement; can anyone provide an explanation? Before: SELECT t1.version_id, t1.id, t2.field1, t3.person_id, t2.id FROM table1 t1, table2 t2, table3 t3 WHERE t1.id = t2.id AND t1.version_id = t2.version_id AND t2.id = 123 AND t1.version_id = t3.version_id AND t1.VERSION_NAME <> 'AA' order by t1.id Plan Cost: 831 Elapsed Time: 00:00:21.40 Number of Records: 40,717 After: SELECT /*+ USE_NL_WITH_INDEX(t1) */ t1.version_id, t1.id, t2.field1, t3.person_id, t2.id FROM table2 t2, table3 t3, table1 t1 WHERE t1.id = t2.id + 0 AND t1.version_id = t2.version_id + 0 AND t2.id = 123 AND t1.version_id = t3.version_id + 0 AND t1.VERSION_NAME || '' <> 'AA' AND t3.version_id = t2.version_id + 0 order by t1.id Plan Cost: 686 Elapsed Time: 00:00:00.95 Number of Records: 40,717 Questions: Why does re-arranging the order of the tables in the FROM clause help? Why does adding + 0 to the WHERE clause comparisons help? Why does || '' <> 'AA' in the WHERE clause VERSION_NAME comparison help? Is this a more efficient way of handling possible nulls on this column?

Read the article
Skip subdirectory in python import

- by jstaab

Ok, so I'm trying to change this: app/ - lib.py - models.py - blah.py Into this: app/ - __init__.py - lib.py - models/ - __init__.py - user.py - account.py - banana.py - blah.py And still be able to import my models using from app.models import User rather than having to change it to from app.models.user import User all over the place. Basically, I want everything to treat the package as a single module, but be able to navigate the code in separate files for development ease. The reason I can't do something like add for file in __all__: from file import * into init.py is I have circular references between the model files. A fix I don't want is to import those models from within the functions that use them. But that's super ugly. Let me give you an example: user.py ... from app.models import Banana ... banana.py ... from app.models import User ... I wrote a quick pre-processing script that grabs all the files, re-writes them to put imports at the top, and puts it into models.py, but that's hardly an improvement, since now my stack traces don't show the line number I actually need to change. Any ideas? I always though init was probably magical but now that I dig into it, I can't find anything that lets me provide myself this really simple convenience.

Read the article
(php) regexto remove comments but ignore occurances within strings

- by David

Hi there, I am writing a comment-stripper and trying to accommodate for all needs here. I have the below stack of code which removes pretty much all comments, but it actually goes too far. A lot of time was spent trying and testing and researching the regex patterns to match, but I don't claim that they are the best at each. My problem is that I also have situation where I have 'PHP comments' (that aren't really comments' in standard code, or even in PHP strings, that I don't actually want to have removed. Example: <?php $Var = "Blah blah //this must not comment"; // this must comment. ?> What ends up happening is that it strips out religiously, which is fine, but it leaves certain problems: <?php $Var = "Blah blah ?> Also: will also cause problems, as the comment removes the rest of the line, including the ending ? See the problem? So this is what I need... Comment characters within '' or "" need to be ignored PHP Comments on the same line, that use double-slashes, should remove perhaps only the comment itself, or should remove the entire php codeblock. Here's the patterns I use at the moment, feel free to tell me if there's improvement I can make in my existing patterns? :) $CompressedData = $OriginalData; $CompressedData = preg_replace('!/\*.*?\*/!s', '', $CompressedData); // removes /* comments */ $CompressedData = preg_replace('!//.*?\n!', '', $CompressedData); // removes //comments $CompressedData = preg_replace('!#.*?\n!', '', $CompressedData); // removes # comments $CompressedData = preg_replace('//', '', $CompressedData); // removes HTML comments Any help that you can give me would be greatly appreciated! :)

Read the article
Find existence of number in a sorted list in constant time? (Interview question)

- by Rich

I'm studying for upcoming interviews and have encountered this question several times (written verbatim) Find or determine non existence of a number in a sorted list of N numbers where the numbers range over M, M N and N large enough to span multiple disks. Algorithm to beat O(log n); bonus points for constant time algorithm. First of all, I'm not sure if this is a question with a real solution. My colleagues and I have mused over this problem for weeks and it seems ill formed (of course, just because we can't think of a solution doesn't mean there isn't one). A few questions I would have asked the interviewer are: Are there repeats in the sorted list? What's the relationship to the number of disks and N? One approach I considered was to binary search the min/max of each disk to determine the disk that should hold that number, if it exists, then binary search on the disk itself. Of course this is only an order of magnitude speedup if the number of disks is large and you also have a sorted list of disks. I think this would yield some sort of O(log log n) time. As for the M N hint, perhaps if you know how many numbers are on a disk and what the range is, you could use the pigeonhole principle to rule out some cases some of the time, but I can't figure out an order of magnitude improvement. Also, "bonus points for constant time algorithm" makes me a bit suspicious. Any thoughts, solutions, or relevant history of this problem?

Read the article
Best available technology for layered disk cache in linux

- by SpliFF

I've just bought a 6-core Phenom with 16G of RAM. I use it primarily for compiling and video encoding (and occassional web/db). I'm finding all activities get disk-bound and I just can't keep all 6 cores fed. I'm buying an SSD raid to sit between the HDD and tmpfs. I want to setup a "layered" filesystem where reads are cached on tmpfs but writes safely go through to the SSD. I want files (or blocks) that haven't been read lately on the SSD to then be written back to a HDD using a compressed FS or block layer. So basically reads: - Check tmpfs - Check SSD - Check HD And writes: - Straight to SSD (for safety), then tmpfs (for speed) And periodically, or when space gets low: - Move least frequently accessed files down one layer. I've seen a few projects of interest. CacheFS, cachefsd, bcache seem pretty close but I'm having trouble determining which are practical. bcache seems a little risky (early adoption), cachefs seems tied to specific network filesystems. There are "union" projects unionfs and aufs that let you mount filesystems over each other (USB device over a DVD usually) but both are distributed as a patch and I get the impression this sort of "transparent" mounting was going to become a kernel feature rather than a FS. I know the kernel has a built-in disk cache but it doesn't seem to work well with compiling. I see a 20x speed improvement when I move my source files to tmpfs. I think it's because the standard buffers are dedicated to a specific process and compiling creates and destroys thousands of processes during a build (just guessing there). It looks like I really want those files precached. I've read tmpfs can use virtual memory. In that case is it practical to create a giant tmpfs with swap on the SSD? I don't need to boot off the resulting layered filesystem. I can load grub, kernel and initrd from elsewhere if needed. So that's the background. The question has several components I guess: Recommended FS and/or block layer for the SSD and compressed HDD. Recommended mkfs parameters (block size, options etc...) Recommended cache/mount technology to bind the layers transparently Required mount parameters Required kernel options / patches, etc..

Read the article
New Computer Build Questions

- by MJ

I'm in the process of gathering parts and specs for a new machine. I wear many hats, so the machine needs to do a lot. I need at least 2 monitor support, if not three. I also play many online MMOs (wow, aion, war hammer, etc), along with some freelance programming projects. I already have a case which is very large, so it will fit anything. I have 2 other SATA HDs. They are more for storage and basic programs. I feel that the best improvement could be done with a solid state HD, true or not? I'm more of a software/programming guy, so ANY input at all on improving this system build would be appreciated. I have a few questions with this list. AMD or Intel? I don't know enough about either to choose what would best fit me. Thanks! **EDIT: Thanks for the input everyone! Here are some answers: I do a lot of programming and gaming, so I do need things for both. The newer video card covers the gaming aspect, as well as allowing me to have many monitors. (hopefully upgrade to dual 30' or more) I don't need any additional HDs at this time. I have a SATA 160g and 120g from my previous computer, and a NAS system with over 2TB of storage on the homenetwork. I just wanted a fast HD for OS/programs/games. With the memory. I have used G.SKILL before in 2 system builds. It's done excellent for me in them. Very stable. **EDIT2: Made some additional changes. Lowered the power supply down to 750, which saves me more $$. Also changed the SSD to 2 WD 650G HDs. Thinking of doing a CPU upgrade to the 3.4GHZ AMD Phenom II X4 965 Black Edition Deneb 3.4GHz System Specs - Budget:$1500 CPU: AMD Phenom II X4 955 Black Edition Deneb 3.2GHz MB: GIGABYTE GA-MA790GPT-UD3H AM3 AMD 790GX HDMI ATX Memory: G.SKILL 4GB (2 x 2GB) 240-Pin DDR3 SDRAM DDR3 1333 (PC3 1066 Video: DIAMOND 5870PE51G Radeon HD 5870 (Cypress XT) 1GB 256-bit GD Power Supply: XCLIO GREATPOWER 1000W ATX12V SLI Ready CrossFire Ready HD:Intel X25-M Mainstream SSDSA2MH080G2C1 2.5" 80GB SATA II MLC Changes: Power Supply: CORSAIR CMPSU-750TX 750W ATX12V / EPS12V HD: 2x Western Digital Caviar Blue WD6400AAKS 640GB CPU: AMD Phenom II X4 965 Black Edition Deneb 3.4GHz

Read the article
OpenVPN multiple servers on the same subnet, high availability

- by andre

Hey everyone. Let me start by saying that my Linux experience isn't super awesome but I can usually find my way around things easily. Over at work we have an OpenVPN setup that's been due for some improvement for a while now. The main server (tap mode) runs in our office, behind a rather slow DSL connection. The main problem is that, since I'm usually out of the office, every time I want to access something on the virtual network I have to go through that server to get anywhere else. We have two servers up on 100 Mbit connections that we use for development and production purposes, about 3 more servers in the office (one of them behind a different T1 line for VOIP) and about two dozen clients who use the network on a daily basis from various locations. We've had situations where network routing (outside of our control) would not allow people to reach our main OpenVPN server whilst the other locations were connectable. Also any time someone outside the office wants to fetch something from any of the servers (say, a 500 MB code repository), a whopping 20 KB/s download speed is just unacceptable these days (did I mention slow DSL? ok). We had to implement traffic shaping on this server since maxing out this connection was fairly trivial. I had the thought of running two (or more) OpenVPN servers in the network. These would have to have the same subnet though, as our application relies on virtual network's IP addresses for some of its core functionality. The clients would also preferably retain the same IP addresses but that's not vital. For simplicity, lets call the current server office and the second server I'm setting up, cloud. Call the server on the T1 phone. This proved to be rather complex because as soon as I connect to cloud, I cannot see office. Any routes to a server that would go through office also do not work while I'm connected to cloud (no ping, nothing) and vice-versa. There's no rules for iptables that would be blocking the traffic either. Recently I came across this article on linuxjournal but the solution they provide seems to only cover the use of two servers and somewhat outdated (can't even find much documentation, their wiki is offline). They also state that adding more servers would be a complex task. Ideally I would like to keep the existing server office running the virtual network and also run the OpenVPN daemon on the cloud and phone servers (100 Mbit and very reliable connection, respectively) so that we're on safe ground in case of a hardware failure, DSL failure, etc. So, in essence, I'm looking for a highly available OpenVPN solution (fix, patch, hack, tweak, whatever you want to call it) that will accept connections on multiple hosts (2 or more) whilst keeping the same IP address subnet regardless of the server to which you connect to. Thanks for reading and sorry for the long post, I hope it gets the point across :P

Read the article
How to set up a file server in a restricted corporate environment

- by Emilio M Bumachar

I work in a big corporation, and the disk space my team gets in the corporate file server is so low, I am considering turning my work PC into a file server. I ask this community for links to tutorials, software suggestions, and advice in general about how to set it up. My machine is an Intel Core2Duo E7500 @ 3GHz, 3 GB of RAM, Running Windows XP Service Pack 3. Upgrading, formatting or installing another OS is out of the question. But I do have Administrator priviledges on the PC, and I can install programs (at least for now). A lot of security software I don't even know about is and must remain installed. But I only need communication whithin the corporate network, which is not restricted. People have usernames (logins) on the corporate network, and I need to use them to restrict access. Simply put, I have a list of logins of team members, and only people in the list should access the files. I have about 150 GB of free disk space. I'm thinking of allocating 100 GB to the team's shared files. I plan monthly backups on machines of co-workers, same configuration. But automation of backups is a nice, unnecessary feature: it's totally acceptable for me to manually copy the contents to a different machine once a month. Uptime is important, as everyone would use these files in their daily work. I have experience as a python and C programmer, but no experience whatsoever as a sysadmin, and almost nothing of my programming experience is network programming. I'm a complete beginner in this. Thanks in advance for any help. EDIT I honestly appreciate all the warnings, I really do, but what I plan to make available is mostly stuff that now is solely on DVDs just for space reasons. It's 'daily work' to read them, but 'daily work write' files will remain on the corporate server. As for the importance of uptime, I think I overstated it: a few outages are OK, it's already an improvement over getting the DVDs. As for policy, my manager is kind of on my side, I will confirm that before making my move. As for getting more space through the proper channels, well, that was Plan A, and it's still on the table... But I don't have much hope. I'm not as "core businees" as I'd like.

Read the article
What router hardware or software should be used when multiple public IPs are routed into the same LAN?

- by lcbrevard

I am looking for recommendations to replace a set of consumer grade (Linksys, Netgear, Belkin) routers with something that can handle more traffic while routing more than one static public IP into the same LAN address space. We have a block of static public IPs, 5 usable, with Comcast Business. Currently four of them are in use for: General office access Web server Mail and DNS servers Download and backup web server for separate business All systems (a mixture of physical and virtual) are in the same LAN address space (10.x.y.0/24) to enable easy access between them inside the office. There are 30 or more systems in use depending on which virtual machines are currently active. We have a mixture of Windows, Linux, FreeBSD, and Solaris. Currently a separate consumer grade router is used for each of the four static addresses, with its WAN address set to the specific static address and a different gateway address for each: uses 10.x.y.1 - various ports are forwarded to various LAN IPs on systems with gateway 10.x.y.1 uses 10.x.y.254 - port 80 is forwarded to a server with gateway 10.x.y.254 uses 10.x.y.253 - ports for mail and dns are forwarded to a server with gateway 10.x.y.253 uses 10.x.y.252 - ports as needed are forwarded to server with gateway 10.x.y.252 Only router 1. is allowed to serve DHCP and address reservation based on the MAC is used for most of the internal "server" IP addresses so they are at fixed values. [Some are set static due to limitations in the address reservation capabilities of router 1.] And, yes, this really does work! But... I am looking for: better DHCP with more capable address reservation higher capacity so I don't have to periodically power cycle the routers One obvious improvement would be to have a real DHCP server and not use a consumer grade router for that purpose. I am torn between buying a "professional" router such as Cisco or Juniper or Sonic Wall verus learning to configure some spare hardware to perform this function. The price goes up extremely rapidly with capabilities for commercial routers! Worse, some routers require licensing based on the number of clients - a disaster in our environment with so many virtual machines. Sorry for such a long posting but I am getting tired of having to power cycle routers and deal with shifting IP addresses afterwards!

Read the article
Low FPS in some games, but hardware not fully used

- by Mario De Schaepmeester

I just did a little funny experiment in the game/sim "Train Simulator 2013". I normally have good FPS in it (around 30) at full settings. What I did was make a really, really long train so that the calculations the sim needed to make were enormous (the sim is quite realistic, it takes all things into account like speed/acceleration, G-forces, comfort levels, possible wheel slip and many more, and most of those things on each carriage seperately). This resulted in only 14FPS as reported by the game, but it felt more like 8FPS or so. I have a Logitech G15 keyboard which has an LCD, and it allows me to monitor CPU/RAM and video card load on it. The strange thing is, all CPU cores were busy, but the total load was only about 60% maximum at all times. The video card was only on 30% load (possibly an important note, the memory was full, which is however not unusual for the game in question). The RAM had plenty of room and there weren't many operations as it didn't grow or shrink much. I just have the feeling that the game would run smoother if it used more of my hardware power. Why is it not doing so? I had the same in another game, The Elder Scrolls: Morrowind when using more than 100 mods (that all use scripting) and a few high res texture mods, + a full-on graphics improvement program. The engine is very old (2003), and so I thought this might be the cause (not being optimised for multithreading). I had thought of possible causes, like: The operating system doesn't let the games use all the resources. It doesn't make use of multi-threading appropriately. To eliminate the former, I tried a CPU stress tool and that got 100% CPU juice as I let it run, so the OS is not the problem. I gave its thread the "higher" priority though. My actual question In both games, I did things the engine was not really built to do or support. Can those games' framerate be limited cause of their own engine not being able to cope? What is the real reason and more importantly, can I help it? And in any case, could something actually be wrong with my hardware? It's all reasonably new, a couple of months, and I (almost) never experience any other trouble. Modern and much more demanding games work absolutely fine. Specs CPU: AMD Phenom II 965 X4 @ 3.4gHz RAM: 8GB of DDR3 RAM Video: MSI GTX560 (nVidia chip) with 1GB of GDDR5 memory OS: Windows 7 Ultimate 64 bit Nothing overclocked.

Read the article
SQL Express 2008 R2 on Amazon EC2 instance: tons of free memory, poor performance

- by gravyface

The old SQL Express 2005 was running on a low-end single Xeon CPU Dell server, RAID 5 7200 disks, 2 GB RAM (SBS 2003). I have not done any baseline measurements on the old physical server, but the Web app is used by half a dozen people (maybe 2 concurrently), so I figured "how bad can an Amazon EC2 instance be?". It's pretty horrible: a difference of 8 seconds of load time on one screen. First of all, I'm not a SQL guru, but here's what I've tried: Had a Small Instance, now running a c1.medium (High Cpu Medium) Windows 2008 32-bit R2 EBS-backed instance running IIS 7.5 and SQL Express 2008 R2. No noticeable improvement. Changed Page File from fixed 256 to Automatic. Setup a Striped Mirror from within Disk Management with two attached 1 GB EBS volumes. Moved database and transaction log, left everything else on the boot EBS volume. No noticeable change. Looked at memory, ~1000 MB of physical memory free (1.7 GB total). Changed SQL instance to use a minimum of 1024 RAM; restarted server, no change in memory usage. SQL still only using ~28MB of RAM(!). So I'm thinking: this database is tiny (28MB), why isn't the whole thing cached in RAM? Surely that would speed up performance. The transaction log is 241 MB. Seems kind of large in comparison -- has this not been committed? Is it a cause of performance degradation? I recall something about Recovery Models and log sizes somewhere in my travels, but not positive. Another thing: the old server was running SQL Express 2005. Not sure if that has any impact, but I tried changing the compatibility level from SQL 2000 to 2008, but that had no effect. Anyways, what else can I try here? Seems ridiculous to throw more virtual hardware at this thing. I know I/O is going to be rough on EBS volumes, but surely others are successfully running small .NET/SQL apps on reasonably priced instances?

Read the article
Nginx common configuration that I might have missed

- by ApPeL

I recently moved from Apache Mod_wsgi to Nginx, and I have seen a major improvement on speed a lowering on memory usage and I am generally very happy with the it. I am not a server expert, so please be gentle. I am wondering if there are any small configuration that I might have missed, that will cause me some issues in the long run... Please see my nginx.conf file user nginx nginx; worker_processes 4; error_log /var/log/nginx/error_log info; events { worker_connections 1024; use epoll; } http { include /etc/nginx/mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] ' '"$request" $status $bytes_sent ' '"$http_referer" "$http_user_agent" ' '"$gzip_ratio"'; client_header_timeout 10m; client_body_timeout 10m; send_timeout 10m; connection_pool_size 256; client_header_buffer_size 1k; large_client_header_buffers 4 2k; request_pool_size 4k; gzip on; gzip_min_length 1100; gzip_buffers 4 8k; gzip_types text/plain; output_buffers 1 32k; postpone_output 1460; sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 75 20; ignore_invalid_headers on; index index.html; server { listen 80; server_name localhost; location /media/ { root /www/django_test1/myapp; # Notice this is the /media folder that we create above } location /mediaadmin/ { alias /opt/python2.6/lib/python2.6/site-packages/django/contrib/admin/media/; # Notice this is the /media folder that we create above } location / { # host and port to fastcgi server fastcgi_pass 127.0.0.1:8080; fastcgi_param SERVER_ADDR $server_addr; fastcgi_param SERVER_PORT $server_port; fastcgi_param SERVER_NAME $server_name; fastcgi_param SERVER_PROTOCOL $server_protocol; fastcgi_param PATH_INFO $fastcgi_script_name; fastcgi_param REQUEST_METHOD $request_method; fastcgi_param QUERY_STRING $query_string; fastcgi_param CONTENT_TYPE $content_type; fastcgi_param CONTENT_LENGTH $content_length; fastcgi_pass_header Authorization; fastcgi_intercept_errors off; client_max_body_size 100M; } access_log /var/log/nginx/localhost.access_log main; error_log /var/log/nginx/localhost.error_log; } }

Read the article
Database model for keeping track of likes/shares/comments on blog posts over time

- by gage

My goal is to keep track of the popular posts on different blog sites based on social network activity at any given time. The goal is not to simply get the most popular now, but instead find posts that are popular compared to other posts on the same blog. For example, I follow a tech blog, a sports blog, and a gossip blog. The tech blog gets waaay more readership than the other two blogs, so in raw numbers every post on the tech blog will always out number views on the other two. So lets say the average tech blog post gets 500 facebook likes and the other two get an average of 50 likes per post. Then when there is a sports blog post that has 200 fb likes and a gossip blog post with 300 while the tech blog posts today have 500 likes I want to highlight the sports and gossip blog posts (more likes than average vs tech blog with more # of likes but just average for the blog) The approach I am thinking of taking is to make an entry in a database for each blog post. Every x minutes (say every 15 minutes) I will check how many likes/shares/comments an entry has received on all the social networks (facebook, twitter, google+, linkeIn). So over time there will be a history of likes for each blog post, i.e post 1234 after 15 min: 10 fb likes, 4 tweets, 6 g+ after 30 min: 15 fb likes, 15 tweets, 10 g+ ... ... after 48 hours: 200 fb likes, 25 tweets, 15 g+ By keeping a history like this for each blog post I can know the average number of likes/shares/tweets at any give time interval. So for example the average number of fb likes for all blog posts 48hrs after posting is 50, and a particular post has 200 I can mark that as a popular post and feature/highlight it. A consideration in the design is to be able to easily query the values (likes/shares) for a specific time-frame, i.e. fb likes after 30min or tweets after 24 hrs in-order to compute averages with which to compare against (or should averages be stored in it's own table?) If this approach is flawed or could use improvement please let me know, but it is not my main question. My main question is what should a database scheme for storing this info look like? Assuming that the above approach is taken I am trying to figure out what a database schema for storing the likes over time would look like. I am brand new to databases, in doing some basic reading I see that it is advisable to make a 3NF database. I have come up with the following possible schema. Schema 1 DB Popular Posts Table: Post post_id ( primary key(pk) ) url title Table: Social Activity activity_id (pk) url (fk) type (i.e. facebook,twitter,g+) value timestamp This was my initial instinct (base on my very limited db knowledge). As far as I under stand this schema would be 3NF? I searched for designs of similar database model, and found this question on stackoverflow, http://stackoverflow.com/questions/11216080/data-structure-for-storing-height-and-weight-etc-over-time-for-multiple-users . The scenario in that question is similar (recording weight/height of users overtime). Taking the accepted answer for that question and applying it to my model results in something like: Schema 2 (same as above, but break down the social activity into 2 tables) DB Popular Posts Table: Post post_id (pk) url title Table: Social Measurement measurement_id (pk) post_id (fk) timestamp Table: Social stat stat_id (pk) measurement_id (fk) type (i.e. facebook,twitter,g+) value The advantage I see in schema 2 is that I will likely want to access all the values for a given time, i.e. when making a measurement at 30min after a post is published I will simultaneous check number of fb likes, fb shares, fb comments, tweets, g+, linkedIn. So with this schema it may be easier get get all stats for a measurement_id corresponding to a certain time, i.e. all social stats for post 1234 at time x. Another thought I had is since it doesn't make sense to compare number of fb likes with number of tweets or g+ shares, maybe it makes sense to separate each social measurement into it's own table? Schema 3 DB Popular Posts Table: Post post_id (pk) url title Table: fb_likes fb_like_id (pk) post_id (fk) timestamp value Table: fb_shares fb_shares_id (pk) post_id (fk) timestamp value Table: tweets tweets__id (pk) post_id (fk) timestamp value Table: google_plus google_plus_id (pk) post_id (fk) timestamp value As you can see I am generally lost/unsure of what approach to take. I'm sure this typical type of database problem (storing measurements overtime, i.e temperature statistic) that must have a common solution. Is there a design pattern/model for this, does it have a name? I tried searching for "database periodic data collection" or "database measurements over time" but didn't find anything specific. What would be an appropriate model to solve the needs of this problem?

Read the article
ODI SDK: Retrieving Information From the Logs

- by Christophe Dupupet

It is fairly common to want to retrieve data from the ODI logs: statistics, execution status, even the generated code can be retrieved from the logs. The ODI SDK provides a robust set of APIs to parse the repository and retreve such information. To locate the information you are looking for, you have to keep in mind the structure of the logs: sessions contain steps; steps containt tasks. The session is the execution unit: basically, each time you execute something (interface, package, procedure, scenario) you create a new session. The steps are the individual entries found in a session: these will be the icons in your package for instance. Or if you are running an interface, you will have one single step: the interface itself. The tasks will represent the more atomic elements of the steps: the individual DDL, DML, scripts and so forth that are generated by ODI, along with all the detailed statistics for that task. All these details can be retrieved with the SDK. Because I had a question recently on the API ODIStepReport, I focus explicitly in this code on Scenario logs, but a lot more can be done with these APIs. Here is the code sample (you can just cut and paste that code in your ODI 11.1.1.6 Groovy console). Just save, adapt the code to your environment (in particular to connect to your repository) and hit "run" //Created by ODI Studioimport oracle.odi.core.OdiInstanceimport oracle.odi.core.config.OdiInstanceConfigimport oracle.odi.core.config.MasterRepositoryDbInfo import oracle.odi.core.config.WorkRepositoryDbInfo import oracle.odi.core.security.Authentication import oracle.odi.core.config.PoolingAttributes import oracle.odi.domain.runtime.scenario.finder.IOdiScenarioFinder import oracle.odi.domain.runtime.scenario.OdiScenario import java.util.Collection import java.io.* /* ----------------------------------------------------------------------------------------- Simple sample code to list all executions of the last version of a scenario,along with detailed steps information----------------------------------------------------------------------------------------- */ /* update the following parameters to match your environment => */def url = "jdbc:oracle:thin:@myserver:1521:orcl"def driver = "oracle.jdbc.OracleDriver"def schema = "ODIM1116"def schemapwd = "ODIM1116PWD"def workrep = "WORKREP1116"def odiuser= "SUPERVISOR"def odiuserpwd = "SUNOPSIS" // Rather than hardcoding the project code and folder name, // a great improvement here would be to parse the entire repository def scenario_name = "LOAD_DWH" /*Scenario Name*/ /* <=End of the update section */ //--------------------------------------//Connection to the repository// Note for ODI 11.1.1.6: you could use predefined odiInstance variable if you are // running the script from a Studio that is already connected to the repository def masterInfo = new MasterRepositoryDbInfo(url, driver, schema, schemapwd.toCharArray(), new PoolingAttributes())def workInfo = new WorkRepositoryDbInfo(workrep, new PoolingAttributes())def odiInstance = OdiInstance.createInstance(new OdiInstanceConfig(masterInfo, workInfo)) //--------------------------------------// In all cases, we need to make sure we have authorized access to the repositorydef auth = odiInstance.getSecurityManager().createAuthentication(odiuser, odiuserpwd.toCharArray())odiInstance.getSecurityManager().setCurrentThreadAuthentication(auth) //--------------------------------------// Retrieve the scenario we are looking fordef odiScenario = ((IOdiScenarioFinder)odiInstance.getTransactionalEntityManager().getFinder(OdiScenario.class)).findLatestByName(scenario_name) if (odiScenario == null){ println("Error: cannot find scenario "+scenario_name); return} //--------------------------------------// Retrieve all reports for the scenario def OdiScenarioReportsList = odiScenario.getScenarioReports() println("*** Listing all reports for Scenario \""+scenario_name+"\" ") //--------------------------------------// For each report, print the folowing:// - start time// - duration// - status// - step reports: selection of details for (s in OdiScenarioReportsList){ println("\tStart time: " + s.getSessionStartTime()) println("\tDuration: " + s.getSessionDuration()) println("\tStatus: " + s.getSessionStatus()) def OdiScenarioStepReportsList = s.getStepReports() for (st in OdiScenarioStepReportsList){ println("\t\tStep Name: " + st.getStepName()) println("\t\tStep Resource Name: " + st.getStepResourceName()) println("\t\tStep Start time: " + st.getStepStartTime()) println("\t\tStep Duration: " + st.getStepDuration()) println("\t\tStep Status: " + st.getStepStatus()) println("\t\tStep # of inserts: " + st.getStepInsertCount()) println("\t\tStep # of updates: " + st.getStepUpdateCount()+'\n') } println("\t")}

Read the article
Your thoughts on Best Practices for Scientific Computing?

- by John Smith

A recent paper by Wilson et al (2014) pointed out 24 Best Practices for scientific programming. It's worth to have a look. I would like to hear opinions about these points from experienced programmers in scientific data analysis. Do you think these advices are helpful and practical? Or are they good only in an ideal world? Wilson G, Aruliah DA, Brown CT, Chue Hong NP, Davis M, Guy RT, Haddock SHD, Huff KD, Mitchell IM, Plumbley MD, Waugh B, White EP, Wilson P (2014) Best Practices for Scientific Computing. PLoS Biol 12:e1001745. http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001745 Box 1. Summary of Best Practices Write programs for people, not computers. (a) A program should not require its readers to hold more than a handful of facts in memory at once. (b) Make names consistent, distinctive, and meaningful. (c) Make code style and formatting consistent. Let the computer do the work. (a) Make the computer repeat tasks. (b) Save recent commands in a file for re-use. (c) Use a build tool to automate workflows. Make incremental changes. (a) Work in small steps with frequent feedback and course correction. (b) Use a version control system. (c) Put everything that has been created manually in version control. Don’t repeat yourself (or others). (a) Every piece of data must have a single authoritative representation in the system. (b) Modularize code rather than copying and pasting. (c) Re-use code instead of rewriting it. Plan for mistakes. (a) Add assertions to programs to check their operation. (b) Use an off-the-shelf unit testing library. (c) Turn bugs into test cases. (d) Use a symbolic debugger. Optimize software only after it works correctly. (a) Use a profiler to identify bottlenecks. (b) Write code in the highest-level language possible. Document design and purpose, not mechanics. (a) Document interfaces and reasons, not implementations. (b) Refactor code in preference to explaining how it works. (c) Embed the documentation for a piece of software in that software. Collaborate. (a) Use pre-merge code reviews. (b) Use pair programming when bringing someone new up to speed and when tackling particularly tricky problems. (c) Use an issue tracking tool. I'm relatively new to serious programming for scientific data analysis. When I tried to write code for pilot analyses of some of my data last year, I encountered tremendous amount of bugs both in my code and data. Bugs and errors had been around me all the time, but this time it was somewhat overwhelming. I managed to crunch the numbers at last, but I thought I couldn't put up with this mess any longer. Some actions must be taken. Without a sophisticated guide like the article above, I started to adopt "defensive style" of programming since then. A book titled "The Art of Readable Code" helped me a lot. I deployed meticulous input validations or assertions for every function, renamed a lot of variables and functions for better readability, and extracted many subroutines as reusable functions. Recently, I introduced Git and SourceTree for version control. At the moment, because my co-workers are much more reluctant about these issues, the collaboration practices (8a,b,c) have not been introduced. Actually, as the authors admitted, because all of these practices take some amount of time and effort to introduce, it may be generally hard to persuade your reluctant collaborators to comply them. I think I'm asking your opinions because I still suffer from many bugs despite all my effort on many of these practices. Bug fix may be, or should be, faster than before, but I couldn't really measure the improvement. Moreover, much of my time has been invested on defence, meaning that I haven't actually done much data analysis (offence) these days. Where is the point I should stop at in terms of productivity? I've already deployed: 1a,b,c, 2a, 3a,b,c, 4b,c, 5a,d, 6a,b, 7a,7b I'm about to have a go at: 5b,c Not yet: 2b,c, 4a, 7c, 8a,b,c (I could not really see the advantage of using GNU make (2c) for my purpose. Could anyone tell me how it helps my work with MATLAB?)

Read the article
Change Comes from Within

- by John K. Hines

I am in the midst of witnessing a variety of teams moving away from Scrum. Some of them are doing things like replacing Scrum terms with more commonly understood terminology. Mainly they have gone back to using industry standard terms and more traditional processes like the RAPID decision making process. For example: Scrum Master becomes Project Lead. Scrum Team becomes Project Team. Product Owner becomes Stakeholders. I'm actually quite sad to see this happening, but I understand that Scrum is a radical change for most organizations. Teams are slowly but surely moving away from Scrum to a process that non-software engineers can understand and follow. Some could never secure the education or personnel (like a Product Owner) to get the whole team engaged. And many people with decision-making authority do not see the value in Scrum besides task planning and tracking. You see, Scrum cannot be mandated. No one can force a team to be Agile, collaborate, continuously improve, and self-reflect. Agile adoptions must start from a position of mutual trust and willingness to change. And most software teams aren't like that. Here is my personal epiphany from over a year of attempting to promote Agile on a small development team: The desire to embrace Agile methodologies must come from each and every member of the team. If this desire does not exist - if the team is satisfied with its current process, if the team is not motivated to improve, or if the team is afraid of change - the actual demonstration of all the benefits prescribed by Agile and Scrum will take years. I've read some blog posts lately that criticise Scrum for demanding "Big Change Up Front." One's opinion of software methodologies boils down to one's perspective. If you see modern software development as successful, you will advocate for small, incremental changes to how it is done. If you see it as broken, you'll be much more motivated to take risks and try something different. So my question to you is this - is modern software development healthy or in need of dramatic improvement? I can tell you from personal experience that any project that requires exploration, planning, development, stabilisation, and deployment is hard. Trying to make that process better with only a slightly modified approach is a mistake. You will become completely dependent upon the skillset of your team (the only variable you can change). But the difficulty of planned work isn't one of skill. It isn't until you solve the fundamental challenges of communication, collaboration, quality, and efficiency that skill even comes into play. So I advocate for Big Change Up Front. And I advocate for it to happen often until those involved can say, from experience, that it is no longer needed. I hope every engineer has the opportunity to see the benefits of Agile and Scrum on a highly functional team. I'll close with more key learnings that can help with a Scrum adoption: Your leaders must understand Scrum. They must understand software development, its inherent difficulties, and how Scrum helps. If you attempt to adopt Scrum before the understanding is there, your leaders will apply traditional solutions to your problems - often creating more problems. Success should be measured by quality, not revenue. Namely, the value of software to an organization is the revenue it generates minus ongoing support costs. You should identify quality-based metrics that show the effect Agile techniques have on your software. Motivation is everything. I finally understand why so many Agile advocates say you that if you are not on a team using Agile, you should leave and find one. Scrum and especially Agile encompass many elegant solutions to a wide variety of problems. If you are working on a team that has not encountered these problems the the team may never see the value in the solutions. Having said all that, I'm not giving up on Agile or Scrum. I am convinced it is a better approach for software development. But reality is saying that its adoption is not straightforward and highly subject to disruption. Unless, that is, everyone really, really wants it.

Read the article
Parallelism in .NET – Part 8, PLINQ’s ForAll Method

- by Reed

Parallel LINQ extends LINQ to Objects, and is typically very similar. However, as I previously discussed, there are some differences. Although the standard way to handle simple Data Parellelism is via Parallel.ForEach, it’s possible to do the same thing via PLINQ. PLINQ adds a new method unavailable in standard LINQ which provides new functionality… LINQ is designed to provide a much simpler way of handling querying, including filtering, ordering, grouping, and many other benefits. Reading the description in LINQ to Objects on MSDN, it becomes clear that the thinking behind LINQ deals with retrieval of data. LINQ works by adding a functional programming style on top of .NET, allowing us to express filters in terms of predicate functions, for example. PLINQ is, generally, very similar. Typically, when using PLINQ, we write declarative statements to filter a dataset or perform an aggregation. However, PLINQ adds one new method, which provides a very different purpose: ForAll. The ForAll method is defined on ParallelEnumerable, and will work upon any ParallelQuery<T>. Unlike the sequence operators in LINQ and PLINQ, ForAll is intended to cause side effects. It does not filter a collection, but rather invokes an action on each element of the collection. At first glance, this seems like a bad idea. For example, Eric Lippert clearly explained two philosophical objections to providing an IEnumerable<T>.ForEach extension method, one of which still applies when parallelized. The sole purpose of this method is to cause side effects, and as such, I agree that the ForAll method “violates the functional programming principles that all the other sequence operators are based upon”, in exactly the same manner an IEnumerable<T>.ForEach extension method would violate these principles. Eric Lippert’s second reason for disliking a ForEach extension method does not necessarily apply to ForAll – replacing ForAll with a call to Parallel.ForEach has the same closure semantics, so there is no loss there. Although ForAll may have philosophical issues, there is a pragmatic reason to include this method. Without ForAll, we would take a fairly serious performance hit in many situations. Often, we need to perform some filtering or grouping, then perform an action using the results of our filter. Using a standard foreach statement to perform our action would avoid this philosophical issue: // Filter our collection var filteredItems = collection.AsParallel().Where( i => i.SomePredicate() ); // Now perform an action foreach (var item in filteredItems) { // These will now run serially item.DoSomething(); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This would cause a loss in performance, since we lose any parallelism in place, and cause all of our actions to be run serially. We could easily use a Parallel.ForEach instead, which adds parallelism to the actions: // Filter our collection var filteredItems = collection.AsParallel().Where( i => i.SomePredicate() ); // Now perform an action once the filter completes Parallel.ForEach(filteredItems, item => { // These will now run in parallel item.DoSomething(); }); This is a noticeable improvement, since both our filtering and our actions run parallelized. However, there is still a large bottleneck in place here. The problem lies with my comment “perform an action once the filter completes”. Here, we’re parallelizing the filter, then collecting all of the results, blocking until the filter completes. Once the filtering of every element is completed, we then repartition the results of the filter, reschedule into multiple threads, and perform the action on each element. By moving this into two separate statements, we potentially double our parallelization overhead, since we’re forcing the work to be partitioned and scheduled twice as many times. This is where the pragmatism comes into play. By violating our functional principles, we gain the ability to avoid the overhead and cost of rescheduling the work: // Perform an action on the results of our filter collection .AsParallel() .Where( i => i.SomePredicate() ) .ForAll( i => i.DoSomething() ); The ability to avoid the scheduling overhead is a compelling reason to use ForAll. This really goes back to one of the key points I discussed in data parallelism: Partition your problem in a way to place the most work possible into each task. Here, this means leaving the statement attached to the expression, even though it causes side effects and is not standard usage for LINQ. This leads to my one guideline for using ForAll: The ForAll extension method should only be used to process the results of a parallel query, as returned by a PLINQ expression. Any other usage scenario should use Parallel.ForEach, instead.

Read the article
SQL SERVER – Solution – Puzzle – Statistics are not Updated but are Created Once

- by pinaldave

Earlier I asked puzzle why statistics are not updated. Read the complete details over here: Statistics are not Updated but are Created Once In the question I have demonstrated even though statistics should have been updated after lots of insert in the table are not updated.(Read the details SQL SERVER – When are Statistics Updated – What triggers Statistics to Update) In this example I have created following situation: Create Table Insert 1000 Records Check the Statistics Now insert 10 times more 10,000 indexes Check the Statistics – it will be NOT updated Auto Update Statistics and Auto Create Statistics for database is TRUE Now I have requested two things in the example 1) Why this is happening? 2) How to fix this issue? I have many answers – here is the how I fixed it which has resolved the issue for me. NOTE: There are multiple answers to this problem and I will do my best to list all. Solution: Create nonclustered Index on column City Here is the working example for the same. Let us understand this script and there is added explanation at the end. -- Execution Plans Difference -- Estimated Execution Plan Vs Actual Execution Plan -- Create Sample Database CREATE DATABASE SampleDB GO USE SampleDB GO -- Create Table CREATE TABLE ExecTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO CREATE NONCLUSTERED INDEX IX_ExecTable1 ON ExecTable (City); GO -- Insert One Thousand Records -- INSERT 1 INSERT INTO ExecTable (ID,FirstName,LastName,City) SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 3 THEN 'Los Angeles' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 7 THEN 'La Cinega' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 13 THEN 'San Diego' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 17 THEN 'Las Vegas' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Display statistics of the table sp_helpstats N'ExecTable', 'ALL' GO -- Select Statement SELECT FirstName, LastName, City FROM ExecTable WHERE City = 'New York' GO -- Display statistics of the table sp_helpstats N'ExecTable', 'ALL' GO -- Replace your Statistics over here DBCC SHOW_STATISTICS('ExecTable', IX_ExecTable1); GO -------------------------------------------------------------- -- Round 2 -- Insert One Thousand Records -- INSERT 2 INSERT INTO ExecTable (ID,FirstName,LastName,City) SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 3 THEN 'Los Angeles' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 7 THEN 'La Cinega' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 13 THEN 'San Diego' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 17 THEN 'Las Vegas' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Select Statement SELECT FirstName, LastName, City FROM ExecTable WHERE City = 'New York' GO -- Display statistics of the table sp_helpstats N'ExecTable', 'ALL' GO -- Replace your Statistics over here DBCC SHOW_STATISTICS('ExecTable', IX_ExecTable1); GO -- Clean up Database DROP TABLE ExecTable GO When I created non clustered index on the column city, it also created statistics on the same column with same name as index. When we populate the data in the column the index is update – resulting execution plan to be invalided – this leads to the statistics to be updated in next execution of SELECT. This behavior does not happen on Heap or column where index is auto created. If you explicitly update the index, often you can see the statistics are updated as well. You can see this is for sure happening if you follow the tell of John Sansom. John Sansom‘s suggestion: That was fun! Although the column statistics are invalidated by the time the second select statement is executed, the query is not compiled/recompiled but instead the existing query plan is reused. It is the “next” compiled query against the column statistics that will see that they are out of date and will then in turn instantiate the action of updating statistics. You can see this in action by forcing the second statement to recompile. SELECT FirstName, LastName, City FROM ExecTable WHERE City = ‘New York’ option(RECOMPILE) GO Kevin Cross also have another suggestion: I agree with John. It is reusing the Execution Plan. Aside from OPTION(RECOMPILE), clearing the Execution Plan Cache before the subsequent tests will also work. i.e., run this before round 2: ————————————————————– – Clear execution plan cache before next test DBCC FREEPROCCACHE WITH NO_INFOMSGS; ————————————————————– Nice puzzle! Kevin As this was puzzle John and Kevin both got the correct answer, there was no condition for answer to be part of best practices. I know John and he is finest DBA around – his tremendous knowledge has always impressed me. John and Kevin both will agree that clearing cache either using DBCC FREEPROCCACHE and recompiling each query every time is for sure not good advice on production server. It is correct answer but not best practice. By the way, if you have better solution or have better suggestion please advise. I am open to change my answer and publish further improvement to this solution. On very separate note, I like to have clustered index on my Primary Key, which I have not mentioned here as it is out of the scope of this puzzle. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, Readers Contribution, Readers Question, SQL, SQL Authority, SQL Index, SQL Puzzle, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Statistics

Read the article
My History with Agile

- by Robert May

I’m going to write my history with Agile here. That way, in future posts, I can refer back to it, instead of typing it out in the post that contains information you may actually want to read. Note that I’m actually a pretty senior developer, and do lots of technical interviews. I’m an Agile fan because of the difference it makes in peoples lives and the improvement in quality it brings, and I’ll sacrifice my technological advance to help teams. Management History I started management pretty early in my career, starting with the first job that I ever had. I actually do NOT have a CS or similar degree. I have a Bachelor’s of Business Administration with an emphasis in Computer Information Systems. My first management gigs were around call center work and were very schedule oriented. I didn’t understand the true value of teams, and I’m ashamed to admit, I actually installed a fingerprint scanner as a time clock in this job. I shudder to think of the impact that I had on the team spirit. I didn’t even trust them enough to fill out their time cards correctly. How sad. I was managing nearly 100 people in this position, with the help of a great set of subordinates. I did try to come up with reward programs for the team, but again, didn’t understand the concept of team, so instead of letting the team determine how the rewards should work, I mandated from on high, which isn’t a good thing. I was told that I wasn’t the type that would be a good manager by people whom I respected a lot. They said it because I was a computer geek, since they don’t understand good management either, but in retrospect, they were right about me then. I was too green. After my first job, I went on to other jobs and with the exception of one job, I’ve managed people at them all. The rest of the management story is important for understanding agile, so I’ll save it for my next post. Technical History I’ve been in software development for many, many years. I technically started programming on a commodore 64 in basic. I didn’t know that I was programming, but I was sure having fun. That was followed by batch files, Gorilla hacking (I always had to win), WordPerfect Macro programming and other things that taught me the basics. My first “real” job was with a telephone company, and that’s where I made my first database application in DataEase, wrote my first VBA app and started using real programming tools, like turbo pascal, vb3-vb5, and semi-real tools like RPG and VisualRPG. I wrote my first web page in 1994, and built my first data driven web page in 1995 using perlDB. You really can do anything with Perl. At this time, I also started a Linux based internet service provider that is still in operation today. One of the people I worked with is now a Microsoft employee building and designing frameworks you probably know well. Smart guy. I also built my first ASP applications connecting to Sql Server 6.5, setup Exchange 5.5 for the company, and many other system administration stuff. I’m a programmer by choice, mostly because I don’t really like PC support. From there, I went on to a large state agency. I got to see and maintain true waterfall projects. 5 years of maintaining the 200 VB COM+ (MTS, actually) dlls that were used to calculate a single number is a long time. That was all Microsoft DNS technologies. SQL Server and VB6 were the tools of choice, although .net started to be a factor near the end of employment. I did some heavy XML work at this job and even wrote an XSD parser and validator in VB6 that was a shim until MSXML 3.0 came out. Prior to 3.0, XSD’s weren’t supported, and I didn’t want to write DTDs. Ironically, jobs after this were more generic. I pretty much settled in on the .net framework and revisions of it. Lots of WPF, some silverlight, lots of ASP.NET, some SQL Azure, lots of SQL Server, some Oracle, but I don’t think that I was as passionate about development and technologies. I was more into the management of development. I like people. Technorati Tags: Agile,history

Read the article
World Record Oracle Business Intelligence Benchmark on SPARC T4-4

- by Brian

Oracle's SPARC T4-4 server configured with four SPARC T4 3.0 GHz processors delivered the first and best performance of 25,000 concurrent users on Oracle Business Intelligence Enterprise Edition (BI EE) 11g benchmark using Oracle Database 11g Release 2 running on Oracle Solaris 10. A SPARC T4-4 server running Oracle Business Intelligence Enterprise Edition 11g achieved 25,000 concurrent users with an average response time of 0.36 seconds with Oracle BI server cache set to ON. The benchmark data clearly shows that the underlying hardware, SPARC T4 server, and the Oracle BI EE 11g (11.1.1.6.0 64-bit) platform scales within a single system supporting 25,000 concurrent users while executing 415 transactions/sec. The benchmark demonstrated the scalability of Oracle Business Intelligence Enterprise Edition 11g 11.1.1.6.0, which was deployed in a vertical scale-out fashion on a single SPARC T4-4 server. Oracle Internet Directory configured on SPARC T4 server provided authentication for the 25,000 Oracle BI EE users with sub-second response time. A SPARC T4-4 with internal Solid State Drive (SSD) using the ZFS file system showed significant I/O performance improvement over traditional disk for the Web Catalog activity. In addition, ZFS helped get past the UFS limitation of 32767 sub-directories in a Web Catalog directory. The multi-threaded 64-bit Oracle Business Intelligence Enterprise Edition 11g and SPARC T4-4 server proved to be a successful combination by providing sub-second response times for the end user transactions, consuming only half of the available CPU resources at 25,000 concurrent users, leaving plenty of head room for increased load. The Oracle Business Intelligence on SPARC T4-4 server benchmark results demonstrate that comprehensive BI functionality built on a unified infrastructure with a unified business model yields best-in-class scalability, reliability and performance. Oracle BI EE 11g is a newer version of Business Intelligence Suite with richer and superior functionality. Results produced with Oracle BI EE 11g benchmark are not comparable to results with Oracle BI EE 10g benchmark. Oracle BI EE 11g is a more difficult benchmark to run, exercising more features of Oracle BI. Performance Landscape Results for the Oracle BI EE 11g version of the benchmark. Results are not comparable to the Oracle BI EE 10g version of the benchmark. Oracle BI EE 11g Benchmark System Number of Users Response Time (sec) 1 x SPARC T4-4 (4 x SPARC T4 3.0 GHz) 25,000 0.36 Results for the Oracle BI EE 10g version of the benchmark. Results are not comparable to the Oracle BI EE 11g version of the benchmark. Oracle BI EE 10g Benchmark System Number of Users 2 x SPARC T5440 (4 x SPARC T2+ 1.6 GHz) 50,000 1 x SPARC T5440 (4 x SPARC T2+ 1.6 GHz) 28,000 Configuration Summary Hardware Configuration: SPARC T4-4 server 4 x SPARC T4-4 processors, 3.0 GHz 128 GB memory 4 x 300 GB internal SSD Storage Configuration: "> Sun ZFS Storage 7120 16 x 146 GB disks Software Configuration: Oracle Solaris 10 8/11 Oracle Solaris Studio 12.1 Oracle Business Intelligence Enterprise Edition 11g (11.1.1.6.0) Oracle WebLogic Server 10.3.5 Oracle Internet Directory 11.1.1.6.0 Oracle Database 11g Release 2 Benchmark Description Oracle Business Intelligence Enterprise Edition (Oracle BI EE) delivers a robust set of reporting, ad-hoc query and analysis, OLAP, dashboard, and scorecard functionality with a rich end-user experience that includes visualization, collaboration, and more. The Oracle BI EE benchmark test used five different business user roles - Marketing Executive, Sales Representative, Sales Manager, Sales Vice-President, and Service Manager. These roles included a maximum of 5 different pre-built dashboards. Each dashboard page had an average of 5 reports in the form of a mix of charts, tables and pivot tables, returning anywhere from 50 rows to approximately 500 rows of aggregated data. The test scenario also included drill-down into multiple levels from a table or chart within a dashboard. The benchmark test scenario uses a typical business user sequence of dashboard navigation, report viewing, and drill down. For example, a Service Manager logs into the system and navigates to his own set of dashboards using Service Manager. The BI user selects the Service Effectiveness dashboard, which shows him four distinct reports, Service Request Trend, First Time Fix Rate, Activity Problem Areas, and Cost Per Completed Service Call spanning 2002 to 2005. The user then proceeds to view the Customer Satisfaction dashboard, which also contains a set of 4 related reports, drills down on some of the reports to see the detail data. The BI user continues to view more dashboards – Customer Satisfaction and Service Request Overview, for example. After navigating through those dashboards, the user logs out of the application. The benchmark test is executed against a full production version of the Oracle Business Intelligence 11g Applications with a fully populated underlying database schema. The business processes in the test scenario closely represent a real world customer scenario. See Also SPARC T4-4 Server oracle.com OTN Oracle Business Intelligence oracle.com OTN Oracle Database 11g Release 2 Enterprise Edition oracle.com OTN WebLogic Suite oracle.com OTN Oracle Solaris oracle.com OTN Disclosure Statement Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 30 September 2012.

Read the article
Independence Day for Software Components – Loosening Coupling by Reducing Connascence

- by Brian Schroer

Today is Independence Day in the USA, which got me thinking about loosely-coupled “independent” software components. I was reminded of a video I bookmarked quite a while ago of Jim Weirich’s “Grand Unified Theory of Software Design” talk at MountainWest RubyConf 2009. I finally watched that video this morning. I highly recommend it. In the video, Jim talks about software connascence. The dictionary definition of connascence (con-NAY-sense) is: 1. The common birth of two or more at the same time 2. That which is born or produced with another. 3. The act of growing together. The brief Wikipedia page about Connascent Software Components says that: Two software components are connascent if a change in one would require the other to be modified in order to maintain the overall correctness of the system. Connascence is a way to characterize and reason about certain types of complexity in software systems. The term was introduced to the software world in Meilir Page-Jones’ 1996 book “What Every Programmer Should Know About Object-Oriented Design”. The middle third of that book is the author’s proposed graphical notation for describing OO designs. UML became the standard about a year later, so a revised version of the book was published in 1999 as “Fundamentals of Object-Oriented Design in UML”. Weirich says that the third part of the book, in which Page-Jones introduces the concept of connascence “is worth the price of the entire book”. (The price of the entire book, by the way, is not much – I just bought a used copy on Amazon for $1.36, so that was a pretty low-risk investment. I’m looking forward to getting the book and learning about connascence from the original source.) Meanwhile, here’s my summary of Weirich’s summary of Page-Jones writings about connascence: The stronger the form of connascence, the more difficult and costly it is to change the elements in the relationship. Some of the connascence types, ordered from weak to strong are: Connascence of Name Connascence of name is when multiple components must agree on the name of an entity. If you change the name of a method or property, then you need to change all references to that method or property. Duh. Connascence of name is unavoidable, assuming your objects are actually used. My main takeaway about connascence of name is that it emphasizes the importance of giving things good names so you don’t need to go changing them later. Connascence of Type Connascence of type is when multiple components must agree on the type of an entity. I assume this is more of a problem for languages without compilers (especially when used in apps without tests). I know it’s an issue with evil JavaScript type coercion. Connascence of Meaning Connascence of meaning is when multiple components must agree on the meaning of particular values, e.g that “1” means normal customer and “2” means preferred customer. The solution to this is to use constants or enums instead of “magic” strings or numbers, which reduces the coupling by changing the connascence form from “meaning” to “name”. Connascence of Position Connascence of positions is when multiple components must agree on the order of values. This refers to methods with multiple parameters, e.g.: eMailer.Send("[email protected]", "[email protected]", "Your order is complete", "Order completion notification"); The more parameters there are, the stronger the connascence of position is between the component and its callers. In the example above, it’s not immediately clear when reading the code which email addresses are sender and receiver, and which of the final two strings are subject vs. body. Connascence of position could be improved to connascence of type by replacing the parameter list with a struct or class. This “introduce parameter object” refactoring might be overkill for a method with 2 parameters, but would definitely be an improvement for a method with 10 parameters. This points out two “rules” of connascence: The Rule of Degree: The acceptability of connascence is related to the degree of its occurrence. The Rule of Locality: Stronger forms of connascence are more acceptable if the elements involved are closely related. For example, positional arguments in private methods are less problematic than in public methods. Connascence of Algorithm Connascence of algorithm is when multiple components must agree on a particular algorithm. Be DRY – Don’t Repeat Yourself. If you have “cloned” code in multiple locations, refactor it into a common function. Those are the “static” forms of connascence. There are also “dynamic” forms, including… Connascence of Execution Connascence of execution is when the order of execution of multiple components is important. Consumers of your class shouldn’t have to know that they have to call an .Initialize method before it’s safe to call a .DoSomething method. Connascence of Timing Connascence of timing is when the timing of the execution of multiple components is important. I’ll have to read up on this one when I get the book, but assume it’s largely about threading. Connascence of Identity Connascence of identity is when multiple components must reference the entity. The example Weirich gives is when you have two instances of the “Bob” Employee class and you call the .RaiseSalary method on one and then the .Pay method on the other does the payment use the updated salary? Again, this is my summary of a summary, so please be forgiving if I misunderstood anything. Once I get/read the book, I’ll make corrections if necessary and share any other useful information I might learn. See Also: Gregory Brown: Ruby Best Practices Issue #24: Connascence as a Software Design Metric (That link is failing at the time I write this, so I had to go to the Google cache of the page.)

Read the article
MySQL Connector/Net 6.4.6 Maintenance Release has been released

- by fernando

MySQL Connector/Net 6.4.6, a new version of the all-managed .NET driver for MySQL has been released. This is a maintenance release and is recommended for use in production environments. It is appropriate for use with MySQL server versions 5.0-5.6. This is intended to be the final release for Connector/NET 6.4. It is now available in source and binary form from http://dev.mysql.com/downloads/connector/net/#downloads and mirror sites (note that not all mirror sites may be up to date at this point-if you can't find this version on some mirror, please try again later or choose another download site.) The 6.4.6 version of MySQL Connector/Net brings the following fixes: - Fix for List.Contains generates a bunch of ORs instead of more efficient IN clause in LINQ to Entities (Oracle bug #14016344, MySql bug #64934). - Fix for error when trying to change the name of an Index on the Indexes/Keys editor; along with this fix now users can change the Index type of a new Index which could not be done in previous versions, and when changing the Index name the change is reflected on the list view at the left side of the Index/Keys editor (Oracle bug #13613801). - Fix for stored procedure call using only its name with EF code first (MySql bug #64999, Oracle bug #14008699). - Fix for performance issue in generated EF query: .NET StartsWith/Contains/EndsWith produces MySql's locate instead of Like (MySql bug #64935, Oracle bug #14009363). - Fix for script generated for code first contains wrong alter table and wrong declaration for byte[] (MySql bug #64216, Oracle bug #13900091). - Fix for Exception thrown when using cascade delete in an EDM Model-First in Entity Framework (Oracle bug #14008752, MySql bug #64779). - Fix for Session locking issue with MySqlSessionStateStore (MySql bug #63997, Oracble bug #13733054). - Fixed deleting a user profile using Profile provider (MySQL bug #64409, Oracle bug #13790123). - Fix for bug Cannot Create an Entity with a Key of Type String (MySQL bug #65289, Oracle bug #14540202). This fix checks if the type has a FixedLength facet set in order to create a char otherwise should create varchar, mediumtext or longtext types when using a String CLR type in Code First or Model First also tested in Database First. Unit tests added for Code First and ProviderManifest. - Fix for bug "CacheServerProperties can cause 'Packet too large' error" (MySQL Bug #66578 Orabug #14593547). - Fix for handling unnamed parameter in MySQLCommand. This fix allows the mysqlcommand to handle parameters without requiring naming (e.g. INSERT INTO Test (id,name) VALUES (?, ?) ) (MySQL Bug #66060, Oracle bug #14499549). - Fixed inheritance on Entity Framework Code First scenarios. Discriminator column is created using its correct type as varchar(128) (MySql bug #63920 and Oracle bug #13582335). - Fixed "Trying to customize column precision in Code First does not work" (MySql bug #65001, Oracle bug #14469048). - Fixed bug ASP.NET Membership database fails on MySql database UTF32 (MySQL bug #65144, Oracle bug #14495292). - Fix for MySqlCommand.LastInsertedId holding only 32 bit values (MySql bug #65452, Oracle bug #14171960) by changing several internal declaration of lastinsertid from int to long. - Fixed "Decimal type should have digits at right of decimal point", now default is 2, but user's changes in EDM designer are recognized (MySql bug #65127, Oracle bug #14474342). - Fix for NullReferenceException when saving an uninitialized row in Entity Framework (MySql bug #66066, Oracle bug #14479715). - Fix for error when calling RoleProvider.RemoveUserFromRole(): causes an exception due to a wrong table being used (MySql bug #65805, Oracle bug #14405338). - Fix for "Memory Leak on MySql.Data.MySqlClient.MySqlCommand", too many MemoryStream's instances created (MySql bug #65696, Oracle bug #14468204). - Small improvement on MySqlPoolManager CleanIdleConnections for better mysqlpoolmanager idlecleanuptimer at startup (MySql bug #66472 and Oracle bug #14652624). - Fix for bug TIMESTAMP values are mistakenly represented as DateTime with Kind = Local (Mysql bug #66964, Oracle bug #14740705). - Fix for bug Keyword not supported. Parameter name: AttachDbFilename (Mysql bug #66880, Oracle bug #14733472). - Added support to MySql script file to retrieve data when using "SHOW" statements. - Fix for Package Load Failure in Visual Studio 2005 (MySql bug #63073, Oracle bug #13491674). - Fix for bug "Unable to connect using IPv6 connections" (MySQL bug #67253, Oracle bug #14835718). - Added auto-generated values for Guid identity columns (MySql bug #67450, Oracle bug #15834176). - Fix for method FirstOrDefault not supported in some LINQ to Entities queries (MySql bug #67377, Oracle bug #15856964). The release is available to download at http://dev.mysql.com/downloads/connector/net/6.4.html Documentation ------------------------------------- You can view current Connector/Net documentation at http://dev.mysql.com/doc/refman/5.5/en/connector-net.html You can find our team blog at http://blogs.oracle.com/MySQLOnWindows. You can also post questions on our forums at http://forums.mysql.com/. Enjoy and thanks for the support!

Read the article
TFS API-Process Template currently applied to the Team Project

- by Tarun Arora

Download Demo Solution - here In this blog post I’ll show you how to use the TFS API to get the name of the Process Template that is currently applied to the Team Project. You can also download the demo solution attached, I’ve tested this solution against TFS 2010 and TFS 2011. 1. Connecting to TFS Programmatically I have a blog post that shows you from where to download the VS 2010 SP1 SDK and how to connect to TFS programmatically. private TfsTeamProjectCollection _tfs; private string _selectedTeamProject; TeamProjectPicker tfsPP = new TeamProjectPicker(TeamProjectPickerMode.SingleProject, false); tfsPP.ShowDialog(); this._tfs = tfsPP.SelectedTeamProjectCollection; this._selectedTeamProject = tfsPP.SelectedProjects[0].Name; 2. Programmatically get the Process Template details of the selected Team Project I’ll be making use of the VersionControlServer service to get the Team Project details and the ICommonStructureService to get the Project Properties. private ProjectProperty[] GetProcessTemplateDetailsForTheSelectedProject() { var vcs = _tfs.GetService<VersionControlServer>(); var ics = _tfs.GetService<ICommonStructureService>(); ProjectProperty[] ProjectProperties = null; var p = vcs.GetTeamProject(_selectedTeamProject); string ProjectName = string.Empty; string ProjectState = String.Empty; int templateId = 0; ProjectProperties = null; ics.GetProjectProperties(p.ArtifactUri.AbsoluteUri, out ProjectName, out ProjectState, out templateId, out ProjectProperties); return ProjectProperties; } 3. What’s the catch? The ProjectProperties will contain a property “Process Template” which as a value has the name of the process template. So, you will be able to use the below line of code to get the name of the process template. var processTemplateName = processTemplateDetails.Where(pt => pt.Name == "Process Template").Select(pt => pt.Value).FirstOrDefault(); However, if the process template does not contain the property “Process Template” then you will need to add it. So, the question becomes how do i add the Name property to the Process Template. Download the Process Template from the Process Template Manager on your local Once you have downloaded the Process Template to your local machine, navigate to the Classification folder with in the template From the classification folder open Classification.xml Add a new property <property name=”Process Template” value=”MSF for CMMI Process Improvement v5.0” /> 4. Putting it all together… using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Linq; using System.Text; using System.Windows.Forms; using Microsoft.TeamFoundation.Client; using Microsoft.TeamFoundation.VersionControl.Client; using Microsoft.TeamFoundation.Server; using System.Diagnostics; using Microsoft.TeamFoundation.WorkItemTracking.Client; namespace TfsAPIDemoProcessTemplate { public partial class Form1 : Form { public Form1() { InitializeComponent(); } private TfsTeamProjectCollection _tfs; private string _selectedTeamProject; private void btnConnect_Click(object sender, EventArgs e) { TeamProjectPicker tfsPP = new TeamProjectPicker(TeamProjectPickerMode.SingleProject, false); tfsPP.ShowDialog(); this._tfs = tfsPP.SelectedTeamProjectCollection; this._selectedTeamProject = tfsPP.SelectedProjects[0].Name; var processTemplateDetails = GetProcessTemplateDetailsForTheSelectedProject(); listBox1.Items.Clear(); listBox1.Items.Add(String.Format("Team Project Selected => '{0}'", _selectedTeamProject)); listBox1.Items.Add(Environment.NewLine); var processTemplateName = processTemplateDetails.Where(pt => pt.Name == "Process Template") .Select(pt => pt.Value).FirstOrDefault(); if (!string.IsNullOrEmpty(processTemplateName)) { listBox1.Items.Add(Environment.NewLine); listBox1.Items.Add(String.Format("Process Template Name: {0}", processTemplateName)); } else { listBox1.Items.Add(String.Format("The Process Template does not have the 'Name' property set up")); listBox1.Items.Add(String.Format("***TIP: Download the Process Template and in Classification.xml add a new property Name, update the template then you will be able to see the Process Template Name***")); listBox1.Items.Add(String.Format(" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -")); } } private ProjectProperty[] GetProcessTemplateDetailsForTheSelectedProject() { var vcs = _tfs.GetService<VersionControlServer>(); var ics = _tfs.GetService<ICommonStructureService>(); ProjectProperty[] ProjectProperties = null; var p = vcs.GetTeamProject(_selectedTeamProject); string ProjectName = string.Empty; string ProjectState = String.Empty; int templateId = 0; ProjectProperties = null; ics.GetProjectProperties(p.ArtifactUri.AbsoluteUri, out ProjectName, out ProjectState, out templateId, out ProjectProperties); return ProjectProperties; } } } Thank you for taking the time out and reading this blog post. If you enjoyed the post, remember to subscribe to http://feeds.feedburner.com/TarunArora. Have you come across a better way of doing this, please share your experience here. Questions/Feedback/Suggestions, etc please leave a comment. Thank You! Share this post : CodeProject

Read the article

Search Results

Search found 1226 results on 50 pages for 'improvement'.

Page 39/50 | < Previous Page | 35 36 37 38 39 40 41 42 43 44 45 46 | Next Page >

- by user1066015

- by Xaqron

- by Patrick Cuff

- by jstaab

- by David

- by Rich

- by SpliFF

- by MJ

- by andre

- by Emilio M Bumachar

- by lcbrevard

- by Mario De Schaepmeester

- by gravyface

- by ApPeL

- by gage

- by Christophe Dupupet

- by John Smith

- by John K. Hines

- by Reed

- by pinaldave

- by Robert May

- by Brian

- by Brian Schroer

- by fernando

- by Tarun Arora

< Previous Page | 35 36 37 38 39 40 41 42 43 44 45 46 | Next Page >