Search Results

Search found 4176 results on 168 pages for 'graph algorithms'.

Page 84/168 | < Previous Page | 80 81 82 83 84 85 86 87 88 89 90 91 | Next Page >

F# and the rose-tinted reflection

- by CliveT

We're already seeing increasing use of many cores on client desktops. It is a change that has been long predicted. It is not just a change in architecture, but our notions of efficiency in a program. No longer can we focus on the asymptotic complexity of an algorithm by counting the steps that a single core processor would take to execute it. Instead we'll soon be more concerned about the scalability of the algorithm and how well we can increase the performance as we increase the number of cores. This may even lead us to throw away our most efficient algorithms, and switch to less efficient algorithms that scale better. We might even be willing to waste cycles in order to speculatively execute at the algorithm rather than the hardware level. State is the big headache in this parallel world. At the hardware level, main memory doesn't necessarily contain the definitive value corresponding to a particular address. An update to a location might still be held in a CPU's local cache and it might be some time before the value gets propagated. To get the latest value, and the notion of "latest" takes a lot of defining in this world of rapidly mutating state, the CPUs may well need to communicate to decide who has the definitive value of a particular address in order to avoid lost updates. At the user program level, this means programmers will need to lock objects before modifying them, or attempt to avoid the overhead of locking by understanding the memory models at a very deep level. I think it's this need to avoid statefulness that has led to the recent resurgence of interest in functional languages. In the 1980s, functional languages started getting traction when research was carried out into how programs in such languages could be auto-parallelised. Sadly, the impracticality of some of the languages, the overheads of communication during this parallel execution, and rapid improvements in compiler technology on stock hardware meant that the functional languages fell by the wayside. The one thing that these languages were good at was getting rid of implicit state, and this single idea seems like a solution to the problems we are going to face in the coming years. Whether these languages will catch on is hard to predict. The mindset for writing a program in a functional language is really very different from the way that object-oriented problem decomposition happens - one has to focus on the verbs instead of the nouns, which takes some getting used to. There are a number of hybrid functional/object languages that have been becoming more popular in recent times. These half-way houses make it easy to use functional ideas for some parts of the program while still allowing access to the underlying object-focused platform without a great deal of impedance mismatch. One example is F# running on the CLR which, in Visual Studio 2010, has because a first class member of the pack. Inside Visual Studio 2010, the tooling for F# has improved to the point where it is easy to set breakpoints and watch values change while debugging at the source level. In my opinion, it is the tooling support that will enable the widespread adoption of functional languages - without this support, people will put off any transition into the functional world for as long as they possibly can. Without tool support it will make it hard to learn these languages. One tool that doesn't currently support F# is Reflector. The idea of decompiling IL to a functional language is daunting, but F# is potentially so important I couldn't dismiss the idea. As I'm currently developing Reflector 6.5, I thought it wise to take four days just to see how far I could get in doing so, even if it achieved little more than to be clearer on how much was possible, and how long it might take. You can read what happened here, and of the insights it gave us on ways to improve the tool.

Read the article
Combining template method with strategy

- by Mekswoll

An assignment in my software engineering class is to design an application which can play different forms a particular game. The game in question is Mancala, some of these games are called Wari or Kalah. These games differ in some aspects but for my question it's only important to know that the games could differ in the following: The way in which the result of a move is handled The way in which the end of the game is determined The way in which the winner is determined The first thing that came to my mind to design this was to use the strategy pattern, I have a variation in algorithms (the actual rules of the game). The design could look like this: I then thought to myself that in the game of Mancala and Wari the way the winner is determined is exactly the same and the code would be duplicated. I don't think this is by definition a violation of the 'one rule, one place' or DRY principle seeing as a change in rules for Mancala wouldn't automatically mean that rule should be changed in Wari as well. Nevertheless from the feedback I got from my professor I got the impression to find a different design. I then came up with this: Each game (Mancala, Wari, Kalah, ...) would just have attribute of the type of each rule's interface, i.e. WinnerDeterminer and if there's a Mancala 2.0 version which is the same as Mancala 1.0 except for how the winner is determined it can just use the Mancala versions. I think the implementation of these rules as a strategy pattern is certainly valid. But the real problem comes when I want to design it further. In reading about the template method pattern I immediately thought it could be applied to this problem. The actions that are done when a user makes a move are always the same, and in the same order, namely: deposit stones in holes (this is the same for all games, so would be implemented in the template method itself) determine the result of the move determine if the game has finished because of the previous move if the game has finished, determine who has won Those three last steps are all in my strategy pattern described above. I'm having a lot of trouble combining these two. One possible solution I found would be to abandon the strategy pattern and do the following: I don't really see the design difference between the strategy pattern and this? But I am certain I need to use a template method (although I was just as sure about having to use a strategy pattern). I also can't determine who would be responsible for creating the TurnTemplate object, whereas with the strategy pattern I feel I have families of objects (the three rules) which I could easily create using an abstract factory pattern. I would then have a MancalaRuleFactory, WariRuleFactory, etc. and they would create the correct instances of the rules and hand me back a RuleSet object. Let's say that I use the strategy + abstract factory pattern and I have a RuleSet object which has algorithms for the three rules in it. The only way I feel I can still use the template method pattern with this is to pass this RuleSet object to my TurnTemplate. The 'problem' that then surfaces is that I would never need my concrete implementations of the TurnTemplate, these classes would become obsolete. In my protected methods in the TurnTemplate I could just call ruleSet.determineWinner(). As a consequence, the TurnTemplate class would no longer be abstract but would have to become concrete, is it then still a template method pattern? To summarize, am I thinking in the right way or am I missing something easy? If I'm on the right track, how do I combine a strategy pattern and a template method pattern? This is part of a homework assignment but I'm not looking to be gifted the answer, I have deliberately been very verbose in my question to show that I have thought about it before coming here to ask a question

Read the article
Big Data – How to become a Data Scientist and Learn Data Science? – Day 19 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the analytics in Big Data Story. In this article we will understand how to become a Data Scientist for Big Data Story. Data Scientist is a new buzz word, everyone seems to be wanting to become Data Scientist. Let us go over a few key topics related to Data Scientist in this blog post. First of all we will understand what is a Data Scientist. In the new world of Big Data, I see pretty much everyone wants to become Data Scientist and there are lots of people I have already met who claims that they are Data Scientist. When I ask what is their role, I have got a wide variety of answers. What is Data Scientist? Data scientists are the experts who understand various aspects of the business and know how to strategies data to achieve the business goals. They should have a solid foundation of various data algorithms, modeling and statistics methodology. What do Data Scientists do? Data scientists understand the data very well. They just go beyond the regular data algorithms and builds interesting trends from available data. They innovate and resurrect the entire new meaning from the existing data. They are artists in disguise of computer analyst. They look at the data traditionally as well as explore various new ways to look at the data. Data Scientists do not wait to build their solutions from existing data. They think creatively, they think before the data has entered into the system. Data Scientists are visionary experts who understands the business needs and plan ahead of the time, this tremendously help to build solutions at rapid speed. Besides being data expert, the major quality of Data Scientists is “curiosity”. They always wonder about what more they can get from their existing data and how to get maximum out of future incoming data. Data Scientists do wonders with the data, which goes beyond the job descriptions of Data Analysist or Business Analysist. Skills Required for Data Scientists Here are few of the skills a Data Scientist must have. Expert level skills with statistical tools like SAS, Excel, R etc. Understanding Mathematical Models Hands-on with Visualization Tools like Tableau, PowerPivots, D3. j’s etc. Analytical skills to understand business needs Communication skills On the technology front any Data Scientists should know underlying technologies like (Hadoop, Cloudera) as well as their entire ecosystem (programming language, analysis and visualization tools etc.) . Remember that for becoming a successful Data Scientist one require have par excellent skills, just having a degree in a relevant education field will not suffice. Final Note Data Scientists is indeed very exciting job profile. As per research there are not enough Data Scientists in the world to handle the current data explosion. In near future Data is going to expand exponentially, and the need of the Data Scientists will increase along with it. It is indeed the job one should focus if you like data and science of statistics. Courtesy: emc Tomorrow In tomorrow’s blog post we will discuss about various Big Data Learning resources. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Obfuscation is not a panacea

- by simonc

So, you want to obfuscate your .NET application. My question to you is: Why? What are your aims when your obfuscate your application? To protect your IP & algorithms? Prevent crackers from breaking your licensing? Your boss says you need to? To give you a warm fuzzy feeling inside? Obfuscating code correctly can be tricky, it can break your app if applied incorrectly, it can cause problems down the line. Let me be clear - there are some very good reasons why you would want to obfuscate your .NET application. However, you shouldn't be obfuscating for the sake of obfuscating. Security through Obfuscation? Once your application has been installed on a user’s computer, you no longer control it. If they do not want to pay for your application, then nothing can stop them from cracking it, even if the time cost to them is much greater than the cost of actually paying for it. Some people will not pay for software, even if it takes them a month to crack a $30 app. And once it is cracked, there is nothing stopping them from putting the result up on the internet. There should be nothing suprising about this; there is no software protection available for general-purpose computers that cannot be cracked by a sufficiently determined attacker. Only by completely controlling the entire stack – software, hardware, and the internet connection, can you have even a chance to be uncrackable. And even then, someone somewhere will still have a go, and probably succeed. Even high-end cryptoprocessors have known vulnerabilities that can be exploited by someone with a scanning electron microscope and lots of free time. So, then, why use obfuscation? Well, the primary reason is to protect your IP. What obfuscation is very good at is hiding the overall structure of your program, so that it’s very hard to figure out what exactly the code is doing at any one time, what context it is running in, and how it fits in with the rest of the application; all of which you need to do to understand how the application operates. This is completely different to cracking an application, where you simply have to find a single toggle that determines whether the application is licensed or not, and flip it without the rest of the application noticing. However, again, there are limitations. An obfuscated application still has to run in the same way, and do the same thing, as the original unobfuscated application. This means that some of the protections applied to the obfuscated assembly have to be undone at runtime, else it would not run on the CLR and do the same thing. And, again, since we don’t control the environment the application is run on, there is nothing stopping a user from undoing those protections manually, and reversing some of the obfuscation. It’s a perpetual arms race, and it always will be. We have plenty of ideas lined about new protections, and the new protections added in SA 6.6 (method parent obfuscation and a new control flow obfuscation level) are specifically designed to be harder to reverse and reconstruct the original structure. So then, by all means, obfuscate your application if you want to protect the algorithms and what the application does. That’s what SmartAssembly is designed to do. But make sure you are clear what a .NET obfuscator can and cannot protect you against, and don’t expect your obfuscated application to be uncrackable. Someone, somewhere, will crack your application if they want to and they don’t have anything better to do with their time. The best we can do is dissuade the casual crackers and make it much more difficult for the serious ones. Cross posted from Simple Talk.

Read the article
Unexplained CPU and Disk activity spikes in SQL Server 2005

- by Philip Goh

Before I pose my question, please allow me to describe the situation. I have a database server, with a number of tables. Two of the biggest tables contain over 800k rows each. The majority of rows are less than 10k in size, though roughly 1 in 100 rows will be 1 MB but <4 MB. So out of the 1.6 million rows, about 16000 of them will be these large rows. The reason they are this big is because we're storing zip files binary blobs in the database, but I'm digressing. We have a service that runs constantly in the background, trimming 10 rows from each of these 2 tables. In the performance monitor graph above, these are the little bumps (red for CPU, green for disk queue). Once ever minute we get a large spike of CPU activity together with a jump in disk activity, indicated by the red arrow in the screenshot. I've run the SQL Server profiler, and there is nothing that jumps out as a candidate that would explain this spike. My suspicion is that this spike occurs when one of the large rows gets deleted. I've fed the results of the profiler into the tuning wizard, and I get no optimisation recommendations (i.e. I assume this means my database is indexed correctly for my current workload). I'm not overly worried as the server is coping fine in all circumstances, even under peak load. However, I would like to know if there is anything else I can do to find out what is causing this spike? Update: After investigating this some more, the CPU and disk usage spike was down to SQL server's automatic checkpoint. The database uses the simple recovery model, and this truncates the log file at each checkpoint. We can see this demonstrated in the following graph. As described on MSDN, the checkpoints will occur when the transaction log becomes 70% full and we are using the simple recovery model. This has been enlightening and I've definitely learned something!

Read the article
How should I monitor memory usage/performance in SunOS/Solaris?

- by exhuma

Last week we decided to add some SunOS (uname -a = SunOS bbs-sam-belair 5.10 Generic_127128-11 i86pc i386 i86pc) machines into our running munin instance. First off, the machines are pre-configured appliances, so, I want to avoid touching the system too much without supervision of the service provider. But adding it to munin was fairly easy by writing a small socket-service (if anyone is interested, I put it up on github: https://github.com/munin-monitoring/contrib/tree/master/tools/pypmmn) Yesterday, I implemented/adapted the required plugins for our machines. And here the questions start: First, I have not found a way to determine detailed memory usage values. I get the total memory by running prtconf | grep Memory, and the free memory using vmstat. Fiddling together a munin-plugin, gives me the following graph: This is pretty much uninformative. Compare this to the default plugin for linux nodes which has a lot more detail: Most importantly, this shows me how much memory is actually used by applications. So, first question: Is it possible to get detailed memory information on SunOS with the default system tools (i.e. not using top)? Onto the next puzzle: Seeing the graphs, I noticed activity in the "Paging in/out" graphs, even though the memory graph still has unused memory: Upon further investigation, I found out that df reports that /tmp is mounted on swap. Drilling around on the web, I understood that df will display swap, but in fact, it's mounted as a tmpfs. Now I don't know if this explains the swap activity. The default munin-plugin for solaris uses kstat -p -c misc -m cpu_stat to get these values. I find it already strange that this is using the cpu_stat module. So maybe I simply misinterpret the "paging" graphs? Second question: Do the paging graphs indicate that parts of the memory are paged to disk? Or is the activity caused by file operations in /tmp?

Read the article
Free/opensource application for charting stock prices?

- by Homunculus Reticulli

I am looking for a free or FOSS software application for SIMPLY charting stock prices. I am not interested in any of the other nonsense typically bundled with such packages (technical analysis, back testing, tracking etc, etc). All I want to do is the following: Import file from CSV and plot on the chart Ability to scroll the chart left/right (zoom feature would be nice to) Ability to draw straight line (between 2 points) on the plot Ability to plot the graph for different resolutions (for e.g weekly, monthly - or some other custom resolution that I want) print the displayed graph (I can always use screen capture if printing is too much to ask) Thats all I want to do. I am not interested in anything else. I would have thought I could have found something by now. I would have written my own tool (I still will do that at a later stage), but I am a bit short of time at the moment, so I just want something that will do all of the above. Can anyone recommend a package. Last but not the least, I am running on Linux (and would prefer to do so - BUT if I have to, I can run on ahem - you know, Windows)

Read the article
Unexplained cache RAM drops on Linux machine

- by FunkyChicken

I run a CentOS 5.7 64 machine with 24gb ram and running kernel 2.6.18-274.12.1.el5. This machine runs only Nginx, php-fpm and Xcache as extra applications. Since about 3 weeks my memory behavior on this machine has changed and I cannot explain why. There are no crons running which flush anything like this. There are also no large numbers of files being deleted/changed during these drops. The 'cached' memory gets dropped about every few hours, but it's never a set gap between flushes, this indicates to me that some bottleneck gets reached instead. It also always seems to be when total memory usages gets to about 18GB, but again, not always exactly 18GB. This is a graph of my memory usage: As you can see in the graph the 'buffers' always stay more or less the same, it is mainly the 'cache' that gets dropped. Running vmstat -m I have outputted the memory usage just before and just after a memory drop. The output is here: http://pastebin.com/diff.php?i=hJqZqztm 'old version' being before, 'new version' being after a drop. About 3 weeks ago my server crashed during a heavy DDOS attack, after I rebooted the machine this odd behavior started. I have checked a bunch of logs, restarted the machine again, and cannot find any indication what changed. During these 'cache' memory drops, my iNode usage drops at the same time. Does anyone have any idea what might be causing this behavior? Clearly my RAM isn't full, so I am curious why this could be happening.

Read the article
What is causing sudden freezing during running real-time program?

- by Trevor Boyd Smith

So I run a high intensive (CPU/GPU) real-time program. During normal execution suddenly everything freezes for 1-4 seconds. I opened "Process Explorer" in the background to help gain insight and maybe identify something. Here is what the CPU/GPU graphs looks like when I align them in time: Notice the 4 distinct drops in both the CPU/GPU. You can see that it goes from some sort of positive CPU/GPU usage to almost zero. These drops in the graph align with when the real-time program suddenly freezes. How do I find what is causing these sudden drops? NOTE: When you put your mouse over the graph it tells you the time, accurate to the second, for where your cursor is. Maybe this mouse over feature could be helpful in some way (e.g. what if you had a log of all processes every 100ms). EDIT: The real-time program is a video game and so I can't watch some sort of instrumentation while the video game is running. I need a solution that let's you look back in time somehow to see what was happening when the slow down occurred. EDIT: RE - Recording Data vs using real-time monitor: So the windows performance recorder is for some reason not recording what I expect it to record. So I switched to using "perfmon" and then opening it's "resource monitor". RE - Setting it up so I can view real-time monitor: In the video game I set it to spectate and then put the video game in "windowed" mode so that I can view the real time display that Resource Monitor has. Now that I can get semi-real time (only once per second... how do you get more than once per second?) I started looking at the various real time data readouts. Getting to the cause: I noticed a strong correlation in high disk IO and low CPU usage (which is also seen by having in-game freezing). How do you use resource monitor to find out who is doing all this offending disk IO?

Read the article
Looking for a host based network monitor solution

- by Ole Martin Handeland

Hi all! Problem So, my hosting company has a network usage graph for my dedicated server. It seems that one day earlier this month, my network usage suddenly spiked with several hundred megabytes transferred (usually it's in the tens, not hundreds). It was probably me, but i just can't be sure who or what it was. Question So my question is; does anyone know of any host based solution for monitoring network usage that would tell me the client's IP-address, the port/service he/she used? What I don't want I'm just guessing that someone will suggest i use nagios, munin, zabbix, cacti, mrtg - I've also looked at those, but a graph over network usage will not give me the answers I'm looking for. :-) Almost there I've already looked at a lot of monitoring solutions, and I've tried [ntop][http://www.ntop.org/], [darkstat][http://unix4lyfe.org/darkstat/] and others. Darkstat just didn't give me the answers. Although it listed a lot of statistics, and i could list the clients - it doesn't show me the network usage for a particular period. Ntop is by far the best I've seen so far - but i think it mostly shows current network usage, not the historical part. I could run apt-get upgrade and download a whole bunch of software, but not see it in the log afterwards.

Read the article
How do I `SUM` by multiple columns in Excel

- by dwwilson66

I have a comma delimited file that includes two columns date/time (which imports as Excel's mm/dd/yyyy hh:mm custom format) and status of 1 or 0. The status represents a piece of equipment either being on or off. I'm trying to generate a graph that will show, hours up vs. down by day. CONSIDER: 1/1/2012 00:00, 1 1/1/2012 03:00, 0 1/1/2012 14:00, 1 1/3/2012 00:00, 0 This tells me that the equipment was up for three hours, down for eleven hours, and then up for thirty-four hours (across two calendar days). However, I would like to generate a graph that shows how many hours PER DAY we were up or down. CONSIDER: 1/1 XXXXXXXXXXXXX----------- (up 13, down 11) 1/2 XXXXXXXXXXXXXXXXXXXXXXXX (up 24) To me, it seems that I need to generate a dataset summing HOURS by STATUS by CALENDAR DAY...but I can't seem to find a flavor of pivot table or nested SUM(IF(SUMIF(...))) combination to make it work. Most troubling is accounting for date changes...in my example above, since my uptime starting at 14:00 on 1/1/2012 crosses midnight, I need to know that 10 uptime hours get totalled with 1/1/2012 and 24 uptime hours get totalled with 1/2/2012. I may be able to do something with a calendar list to drive the date summation, but then I need a way to compare 01/01/2012 to 01/01/2012 03:00 as equal. There's got to be a way along the lines of if(INTEGER-PORTIONS-OF-SERIAL-DATES-ARE-EQUAL,TOTAL-HOURS-IF-VALUE-IS_1,0) but nothing's worked so far. Any suggestions? I've been battling this most of the day, and need a fresh perspective. Thanks

Read the article
FreeBSD's ng_nat stopping pass the packets periodically

- by Korjavin Ivan

I have FreeBSD router: #uname 9.1-STABLE FreeBSD 9.1-STABLE #0: Fri Jan 18 16:20:47 YEKT 2013 It's a powerful computer with a lot of memory #top -S last pid: 45076; load averages: 1.54, 1.46, 1.29 up 0+21:13:28 19:23:46 84 processes: 2 running, 81 sleeping, 1 waiting CPU: 3.1% user, 0.0% nice, 32.1% system, 5.3% interrupt, 59.5% idle Mem: 390M Active, 1441M Inact, 785M Wired, 799M Buf, 5008M Free Swap: 8192M Total, 8192M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 4 155 ki31 0K 64K RUN 3 71.4H 254.83% idle 13 root 4 -16 - 0K 64K sleep 0 101:52 103.03% ng_queue 0 root 14 -92 0 0K 224K - 2 229:44 16.55% kernel 12 root 17 -84 - 0K 272K WAIT 0 213:32 15.67% intr 40228 root 1 22 0 51060K 25084K select 0 20:27 1.66% snmpd 15052 root 1 52 0 104M 22204K select 2 4:36 0.98% mpd5 19 root 1 16 - 0K 16K syncer 1 0:48 0.20% syncer Its tasks are: NAT via ng_nat and PPPoE server via mpd5. Traffic through - about 300Mbit/s, about 40kpps at peak. Pppoe sessions created - 350 max. ng_nat is configured by by the script: /usr/sbin/ngctl -f- <<-EOF mkpeer ipfw: nat %s out name ipfw:%s %s connect ipfw: %s: %s in msg %s: setaliasaddr 1.1.%s There are 20 such ng_nat nodes, with about 150 clients. Sometimes, the traffic via nat stops. When this happens vmstat reports a lot of FAIL counts vmstat -z | grep -i netgraph ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP NetGraph items: 72, 10266, 1, 376,39178965, 0, 0 NetGraph data items: 72, 10266, 9, 10257,2327948820,2131611,4033 I was tried increase net.graph.maxdata=10240 net.graph.maxalloc=10240 but this doesn't work. It's a new problem (1-2 week). The configuration had been working well for about 5 months and no configuration changes were made leading up to the problems starting. In the last few weeks we have slightly increased traffic (from 270 to 300 mbits) and little more pppoe sessions (300-350). Help me please, how to find and solve my problem?

Read the article
Adding custom service to nagiosgraph

- by ravloony

I have successfully added nagiosgraph to our nagios installation. I also added a memory checker plugin, from here : http://blog.vergiss-blackjack.de/2010/04/nagios-plugin-to-check-memory-consumption/. However I can't seem to get the graph of this service to be output by nagiosgraph. The plugin returns a single line like this: 31% (3785 of 11903 MB) used so i added a rule like this to the map file: /output:(\d+)% $(\d+) of (\d+) MB$ used/ and push @s, ['Mem', ['Percentage', 'GUAGE', $1], ['Used', 'GUAGE', $2], ['Total', 'GUAGE', $3] ]; I have also read this : http://www.mail-archive.com/[email protected]/msg36835.html and made sure that process_performance_data=1 in the nagios conf file. So far I have no graph for the Mem service on any host, and no rrd file either. I am unsure how to proceed to get this working. The documentation is rather difficult to follow and I haven't managed yet to understand it enough to do this. Can anyone point me to a tutorial, or some documentation which explains the steps needed to get a service noticed and graphed by nagiosgraph?

Read the article
Mercurial confusion - commit / push, backouts

- by Madmanguruman

I'm trying to set up a repository on a shared filesystem. I'm using Mercurial 2.1.2 on a Windows-based architecture. I start with an empty folder on the shared filesystem and create a repository in it. After this, I dump in the baseline files, and add them to versioning, then commit the changes. I then clone the repository to my local hard drive. I then make a change in my local repository, commit it, then push back to the shared filesystem repository. The shared repo graph I get in TortoiseHG looks strange (to me). This is the shared repo: This is the local repo: On the shared repo, the working directory always shows up on the top, then the graph goes 'down' to rev. 0 then back 'up' again through various revisions. It looks to me like I have two different branches, even though everything is on the default branch. Also, that 'top' revision always says "* Working Directory * Not a head revision!" I noticed that in my local repository, I don't get that dangling working directory at the top of the list - everything is in one branch. I also noticed that on my local repository, I can back out the tip revision with no problem. On the shared filesystem repository, I cannot, since I get an error ("Cannot backout change on a different branch"). How can this be? Aren't they supposed to be identical to each other? Am I fundamentally doing something wrong?

Read the article
Issues Deploying Functional WAR to Elastic Beanstalk with Tomcat7

- by BFar

I am currently deploying OpenTripPlanner (http://github.com/OpenPlans/OpenTripPlanner.git) to Elastic Beanstalk. I'm able to successfully build and deploy opentripplanner with my own customized settings on an ec2. I have set it up so that the appropriate WAR file can be placed in the Tomcat/Webapps folder, and when Tomcat is started up, it will auto-deploy, and even download open trip planner's graph.obj from an S3. All of that works just fine, except when I try to deploy to Elastic Beanstalk. When I upload to Elastic Beanstalk, the log shows that my WAR file is successfully unpacked & successfully downloads the graph.obj from my S3. The only difference is that then nothing happens and I can't load the site in my browser. The health is RED, and I can't figure out what is going on. I've tried looking into ports and dns issues, but I can't determine what's wrong. Anyone have any ideas? Why would a WAR that works on tomcat7 outside of Beanstalk fail to be accessible?

Read the article
Microsoft Excel not graphing

- by SmartLemon

Im not sure if this is a math question or a su question. The experiment was relating the period of one "bounce" when you hang a weight on a spring and let it bounce. I have this data here, one being mass and one being time. The time is an average of 5 trials, each one being and average of 20 bounces, to minimize human error. t 0.3049s 0.3982s 0.4838s 0.5572s 0.6219s 0.6804s 0.7362s 0.7811s 0.8328s 0.869s The mass is the mass that was used in each trial (they aren't going up in exact differences because each weight has a slight difference, nothing is perfect in the real world) m 50.59g 100.43g 150.25g 200.19g 250.89g 301.16g 351.28g 400.79g 450.43g 499.71g My problem is that I need to find the relationship between them, I know m = (k/4PI^2)*T^2 so I can work out k like that but we need to graph it. I can assume that the relationship is a sqrt relation, not sure on that one. But it appears to be the reverse of a square. Should it be 1/x^2 then? Either way my problem is still present, I have tried 1/x, 1/x^2, sqrt, x^2, none of them produce a straight line. The problem for SU is that when I go to graph the data on Excel I set the y axis data (which is the weights) and then when I go to set the x axis (which is the time) it just replaces the y axis with what I want to be the x axis, this is only happening when I have the sqrt of "m" as the y axis and I try to set the x axis as the time. The problem of math is that, am I even using the right thing? To get a straight line it would need to be x = y^1/2 right? I thought I was doing the right thing, it is what we were told to do. I'm just not getting anything that looks right.

Read the article
Algorithm to get through a maze

- by Sam

Hello, We are currently programming a game (its a pretty unknown language: modula 2), And the problem we encountered is the following: we have a maze (not a perfect maze) in a 17 x 12 grid. The computer has to generate a way from the starting point (9, 12) to the end point (9, 1). I found some algorithms but they dont work when the robot has to go back: xxxxx x => x x xxx or: xxxxx x xxxxxx x x x x x xxxxxx x => x xxxxxxxxx I found a solution for the first example type but then the second type couldn't be solved and the solution I made up for the second type would cause the robot to get stuck in the first type of situation. It's a lot of code so i'll give the idea: WHILE (end destination not reached) DO { try to go right, if nothing blocks you: go right if you encounter a block, try up until you can go right, if you cant go up anymore try going down until you can go right, (starting from the place you first were blocked), if you cant go down anymore, try one step left and fill the spaces you tested with blocks. } This works for the first type of problem ... not for the second one. Now it could be that i started wrong so i am open for better algorithms or solutions specificaly to how i could improve my algorithm. Thanks a lot!!

Read the article
How to code a URL shortener?

- by marco92w

I want to create a URL shortener service where you can write a long URL into an input field and the service shortens the URL to "http://www.example.org/abcdef". Instead of "abcdef" there can be any other string with six characters containing a-z, A-Z and 0-9. That makes 56 trillion possible strings. My approach: I have a database table with three columns: id, integer, auto-increment long, string, the long URL the user entered short, string, the shortened URL (or just the six characters) I would then insert the long URL into the table. Then I would select the auto-increment value for "id" and build a hash of it. This hash should then be inserted as "short". But what sort of hash should I build? Hash algorithms like MD5 create too long strings. I don't use these algorithms, I think. A self-built algorithm will work, too. My idea: For "http://www.google.de/" I get the auto-increment id 239472. Then I do the following steps: short = ''; if divisible by 2, add "a"+the result to short if divisible by 3, add "b"+the result to short ... until I have divisors for a-z and A-Z. That could be repeated until the number isn't divisible any more. Do you think this is a good approach? Do you have a better idea?

Read the article
RESTful API design question - how should one allow users to create new resource instances?

- by Tamás

I'm working in a research group where we intend to publish implementations of some of the algorithms we develop on the web via a RESTful API. Most of these algorithms work on small to medium size datasets, and in many cases, a user of our services might want to run multiple queries (with different parameters) on the same dataset, so for me it seems reasonable to allow users to upload their datasets in advance and refer to them in their queries later. In this sense, a dataset could be a resource in my API, and an algorithm could be another. My question is: how should I let the users upload their own datasets? I cannot simply let users upload their data to /dataset/dataset_id as letting the users invent their own dataset_ids might result in ID collision and users overwriting each other's datasets by accident. (I believe one of the most frequently used dataset ID would be test). I think an ideal way would be to have a dedicated URL (like /dataset/upload) where users can POST their datasets and the response would contain a unique ID under which the dataset was stored, but I'm not sure that it does not violate the basic principles of REST. What is the preferred way of dealing with such scenarios?

Read the article
Fuzzy Search on Material Descriptions including numerical sizes & general descriptions of material t

- by Kyle

We're looking to provide a fuzzy search on an electrical materials database (i.e. conduit, cable, etc.). The problem is that, because of a lack of consistency across all material types, we could not split sizes into separate fields from the text description because some materials are rated by things other than size. I've attempted a combination of a full text search & a SQL CLR implementation of the Levenshtein search algorithm (for assistance in ranking), but my results are a little funky (i.e. they are not sorting correctly due to improper ranking). For example, if the search term is "3/4" ABCD Conduit", I'll might get back several irrelevant results in the following order: 1/2" Conduit 1/4" X 3/4" Cable 1/4" Cable Ties 3/4" DFC Conduit Tees 3/4" ABCD Conduit 3/4" Conduit I believe I've nailed the problem down to the fact that these two search algorithms do not factor in the relevance of punctuation & numeric. That is, in such a search, I'd expect the size to take precedence over any fuzzy match on the rest of the description, but my results don't reflect that. My question is: Can anyone recommend better search algorithms or different approaches that may be better suited for searching a combination of alphanumerics & punctuation characters?

Read the article
Getting started with massive data

- by Max

I'm a math guy and occasionally do some statistics/machine learning analysis consulting projects on the side. The data I have access to are usually on the smaller side, at most a couple hundred of megabytes (and almost always far less), but I want to learn more about handling and analyzing data on the gigabyte/terabyte scale. What do I need to know and what are some good resources to learn from? Hadoop/MapReduce is one obvious start. Is there a particular programming language I should pick up? (I primarily work now in Python, Ruby, R, and occasionally Java, but it seems like C and Clojure are often used for large-scale data analysis?) I'm not really familiar with the whole NoSQL movement, except that it's associated with big data. What's a good place to learn about it, and is there a particular implementation (Cassandra, CouchDB, etc.) I should get familiar with? Where can I learn about applying machine learning algorithms to huge amounts of data? My math background is mostly on the theory side, definitely not on the numerical or approximation side, and I'm guessing most of the standard ML algorithms don't really scale. Any other suggestions on things to learn would be great!

Read the article
Code bacteria: evolving mathematical behavior

- by Stefano Borini

It would not be my intention to put a link on my blog, but I don't have any other method to clarify what I really mean. The article is quite long, and it's in three parts (1,2,3), but if you are curious, it's worth the reading. A long time ago (5 years, at least) I programmed a python program which generated "mathematical bacteria". These bacteria are python objects with a simple opcode-based genetic code. You can feed them with a number and they return a number, according to the execution of their code. I generate their genetic codes at random, and apply an environmental selection to those objects producing a result similar to a predefined expected value. Then I let them duplicate, introduce mutations, and evolve them. The result is quite interesting, as their genetic code basically learns how to solve simple equations, even for values different for the training dataset. Now, this thing is just a toy. I had time to waste and I wanted to satisfy my curiosity. however, I assume that something, in terms of research, has been made... I am reinventing the wheel here, I hope. Are you aware of more serious attempts at creating in-silico bacteria like the one I programmed? Please note that this is not really "genetic algorithms". Genetic algorithms is when you use evolution/selection to improve a vector of parameters against a given scoring function. This is kind of different. I optimize the code, not the parameters, against a given scoring function.

Read the article
What should a self-taught programmer with no degree learn/read?

- by sjbotha

I am a self-taught programmer and I do do not have any degrees. I started pretty young and I've got about 7 years of actual programming work experience. I believe I'm a pretty good programmer, but I admit that I have not played much with algorithms or delved into any really low-level aspects of programming such as how compilers work. I have worked with other programmers with and without degrees. Some were good and some not; having a degree didn't seem to make any difference as to which pot they fell into. Since then I've come to realize that it does depend on the school where the degree is obtained. Some people suggest that you really should get a degree; that there are things you'll learn in the process that you won't learn in the real world. Of course there is personal growth and discipline learned from completing a task of that magnitude, but let's just concentrate on the technical knowledge. What would I have been taught in a GOOD CS course that would aid me today and what can I read to fill the gap? I've heard the book "Algorithms" mentioned and I plan on reading that. What other books would you recommend? Edit: Clarification on 'actual work experience': Have worked for 2 small companies on teams with fewer than 5 people. About 2 years experience with Perl, Python, PHP, C, C++. About 5 years experience in Java, Applets, RMI, T-SQL, PL/SQL, VB6. 7 years experience in HTML, Javascript, bash, SQL. Most recently in Java designed and helped build an N-tier Java app with web frontend and RMI.

Read the article
Optimizing processing and management of large Java data arrays

- by mikera

I'm writing some pretty CPU-intensive, concurrent numerical code that will process large amounts of data stored in Java arrays (e.g. lots of double[100000]s). Some of the algorithms might run millions of times over several days so getting maximum steady-state performance is a high priority. In essence, each algorithm is a Java object that has an method API something like: public double[] runMyAlgorithm(double[] inputData); or alternatively a reference could be passed to the array to store the output data: public runMyAlgorithm(double[] inputData, double[] outputData); Given this requirement, I'm trying to determine the optimal strategy for allocating / managing array space. Frequently the algorithms will need large amounts of temporary storage space. They will also take large arrays as input and create large arrays as output. Among the options I am considering are: Always allocate new arrays as local variables whenever they are needed (e.g. new double[100000]). Probably the simplest approach, but will produce a lot of garbage. Pre-allocate temporary arrays and store them as final fields in the algorithm object - big downside would be that this would mean that only one thread could run the algorithm at any one time. Keep pre-allocated temporary arrays in ThreadLocal storage, so that a thread can use a fixed amount of temporary array space whenever it needs it. ThreadLocal would be required since multiple threads will be running the same algorithm simultaneously. Pass around lots of arrays as parameters (including the temporary arrays for the algorithm to use). Not good since it will make the algorithm API extremely ugly if the caller has to be responsible for providing temporary array space.... Allocate extremely large arrays (e.g. double[10000000]) but also provide the algorithm with offsets into the array so that different threads will use a different area of the array independently. Will obviously require some code to manage the offsets and allocation of the array ranges. Any thoughts on which approach would be best (and why)?

Read the article
Compressibility Example

- by user285726

From my algorithms textbook: The annual county horse race is bringing in three thoroughbreds who have never competed against one another. Excited, you study their past 200 races and summarize these as prob- ability distributions over four outcomes: first (“first place”), second, third, and other. Outcome Aurora Whirlwind Phantasm 0.15 0.30 0.20 first 0.10 0.05 0.30 second 0.70 0.25 0.30 third 0.05 0.40 0.20 other Which horse is the most predictable? One quantitative approach to this question is to look at compressibility. Write down the history of each horse as a string of 200 values (first, second, third, other). The total number of bits needed to encode these track- record strings can then be computed using Huffman’s algorithm. This works out to 290 bits for Aurora, 380 for Whirlwind, and 420 for Phantasm (check it!). Aurora has the shortest encoding and is therefore in a strong sense the most predictable. How did they get 420 for Phantasm? I keep getting 400 bytes, as so: Combine first, other = 0.4, combine second, third = 0.6. End up with 2 bits encoding each position. Is there something I've misunderstood about the Huffman encoding algorithm? Textbook available here: http://www.cs.berkeley.edu/~vazirani/algorithms.html (page 156).

Read the article

Search Results

Search found 4176 results on 168 pages for 'graph algorithms'.

Page 84/168 | < Previous Page | 80 81 82 83 84 85 86 87 88 89 90 91 | Next Page >

- by CliveT

- by Mekswoll

- by Pinal Dave

- by simonc

- by Philip Goh

- by exhuma

- by Homunculus Reticulli

- by FunkyChicken

- by Trevor Boyd Smith

- by Ole Martin Handeland

- by dwwilson66

- by Korjavin Ivan

- by ravloony

- by Madmanguruman

- by BFar

- by SmartLemon

- by Sam

- by marco92w

- by Tamás

- by Kyle

- by Max

- by Stefano Borini

- by sjbotha

- by mikera

- by user285726

< Previous Page | 80 81 82 83 84 85 86 87 88 89 90 91 | Next Page >