Search Results

Search found 7220 results on 289 pages for 'graph algorithm'.

Page 155/289 | < Previous Page | 151 152 153 154 155 156 157 158 159 160 161 162 | Next Page >

Trouble with copying dictionaries and using deepcopy on an SQLAlchemy ORM object

- by Az

Hi there, I'm doing a Simulated Annealing algorithm to optimise a given allocation of students and projects. This is language-agnostic pseudocode from Wikipedia: s ? s0; e ? E(s) // Initial state, energy. sbest ? s; ebest ? e // Initial "best" solution k ? 0 // Energy evaluation count. while k < kmax and e > emax // While time left & not good enough: snew ? neighbour(s) // Pick some neighbour. enew ? E(snew) // Compute its energy. if enew < ebest then // Is this a new best? sbest ? snew; ebest ? enew // Save 'new neighbour' to 'best found'. if P(e, enew, temp(k/kmax)) > random() then // Should we move to it? s ? snew; e ? enew // Yes, change state. k ? k + 1 // One more evaluation done return sbest // Return the best solution found. The following is an adaptation of the technique. My supervisor said the idea is fine in theory. First I pick up some allocation (i.e. an entire dictionary of students and their allocated projects, including the ranks for the projects) from entire set of randomised allocations, copy it and pass it to my function. Let's call this allocation aOld (it is a dictionary). aOld has a weight related to it called wOld. The weighting is described below. The function does the following: Let this allocation, aOld be the best_node From all the students, pick a random number of students and stick in a list Strip (DEALLOCATE) them of their projects ++ reflect the changes for projects (allocated parameter is now False) and lecturers (free up slots if one or more of their projects are no longer allocated) Randomise that list Try assigning (REALLOCATE) everyone in that list projects again Calculate the weight (add up ranks, rank 1 = 1, rank 2 = 2... and no project rank = 101) For this new allocation aNew, if the weight wNew is smaller than the allocation weight wOld I picked up at the beginning, then this is the best_node (as defined by the Simulated Annealing algorithm above). Apply the algorithm to aNew and continue. If wOld < wNew, then apply the algorithm to aOld again and continue. The allocations/data-points are expressed as "nodes" such that a node = (weight, allocation_dict, projects_dict, lecturers_dict) Right now, I can only perform this algorithm once, but I'll need to try for a number N (denoted by kmax in the Wikipedia snippet) and make sure I always have with me, the previous node and the best_node. So that I don't modify my original dictionaries (which I might want to reset to), I've done a shallow copy of the dictionaries. From what I've read in the docs, it seems that it only copies the references and since my dictionaries contain objects, changing the copied dictionary ends up changing the objects anyway. So I tried to use copy.deepcopy().These dictionaries refer to objects that have been mapped with SQLA. Questions: I've been given some solutions to the problems faced but due to my über green-ness with using Python, they all sound rather cryptic to me. Deepcopy isn't playing nicely with SQLA. I've been told thatdeepcopy on ORM objects probably has issues that prevent it from working as you'd expect. Apparently I'd be better off "building copy constructors, i.e. def copy(self): return FooBar(....)." Can someone please explain what that means? I checked and found out that deepcopy has issues because SQLAlchemy places extra information on your objects, i.e. an _sa_instance_state attribute, that I wouldn't want in the copy but is necessary for the object to have. I've been told: "There are ways to manually blow away the old _sa_instance_state and put a new one on the object, but the most straightforward is to make a new object with __init__() and set up the attributes that are significant, instead of doing a full deep copy." What exactly does that mean? Do I create a new, unmapped class similar to the old, mapped one? An alternate solution is that I'd have to "implement __deepcopy__() on your objects and ensure that a new _sa_instance_state is set up, there are functions in sqlalchemy.orm.attributes which can help with that." Once again this is beyond me so could someone kindly explain what it means? A more general question: given the above information are there any suggestions on how I can maintain the information/state for the best_node (which must always persist through my while loop) and the previous_node, if my actual objects (referenced by the dictionaries, therefore the nodes) are changing due to the deallocation/reallocation taking place? That is, without using copy?

Read the article
Online ALTER TABLE in MySQL 5.6

- by Marko Mäkelä

This is the low-level view of data dictionary language (DDL) operations in the InnoDB storage engine in MySQL 5.6. John Russell gave a more high-level view in his blog post April 2012 Labs Release – Online DDL Improvements. MySQL before the InnoDB Plugin Traditionally, the MySQL storage engine interface has taken a minimalistic approach to data definition language. The only natively supported operations were CREATE TABLE, DROP TABLE and RENAME TABLE. Consider the following example: CREATE TABLE t(a INT); INSERT INTO t VALUES (1),(2),(3); CREATE INDEX a ON t(a); DROP TABLE t; The CREATE INDEX statement would be executed roughly as follows: CREATE TABLE temp(a INT, INDEX(a)); INSERT INTO temp SELECT * FROM t; RENAME TABLE t TO temp2; RENAME TABLE temp TO t; DROP TABLE temp2; You could imagine that the database could crash when copying all rows from the original table to the new one. For example, it could run out of file space. Then, on restart, InnoDB would roll back the huge INSERT transaction. To fix things a little, a hack was added to ha_innobase::write_row for committing the transaction every 10,000 rows. Still, it was frustrating that even a simple DROP INDEX would make the table unavailable for modifications for a long time. Fast Index Creation in the InnoDB Plugin of MySQL 5.1 MySQL 5.1 introduced a new interface for CREATE INDEX and DROP INDEX. The old table-copying approach can still be forced by SET old_alter_table=0. This interface is used in MySQL 5.5 and in the InnoDB Plugin for MySQL 5.1. Apart from the ability to do a quick DROP INDEX, the main advantage is that InnoDB will execute a merge-sort algorithm before inserting the index records into each index that is being created. This should speed up the insert into the secondary index B-trees and potentially result in a better B-tree fill factor. The 5.1 ALTER TABLE interface was not perfect. For example, DROP FOREIGN KEY still invoked the table copy. Renaming columns could conflict with InnoDB foreign key constraints. Combining ADD KEY and DROP KEY in ALTER TABLE was problematic and not atomic inside the storage engine. The ALTER TABLE interface in MySQL 5.6 The ALTER TABLE storage engine interface was completely rewritten in MySQL 5.6. Instead of introducing a method call for every conceivable operation, MySQL 5.6 introduced a handful of methods, and data structures that keep track of the requested changes. In MySQL 5.6, online ALTER TABLE operation can be requested by specifying LOCK=NONE. Also LOCK=SHARED and LOCK=EXCLUSIVE are available. The old-style table copying can be requested by ALGORITHM=COPY. That one will require at least LOCK=SHARED. From the InnoDB point of view, anything that is possible with LOCK=EXCLUSIVE is also possible with LOCK=SHARED. Most ALGORITHM=INPLACE operations inside InnoDB can be executed online (LOCK=NONE). InnoDB will always require an exclusive table lock in two phases of the operation. The execution phases are tied to a number of methods: handler::check_if_supported_inplace_alter Checks if the storage engine can perform all requested operations, and if so, what kind of locking is needed. handler::prepare_inplace_alter_table InnoDB uses this method to set up the data dictionary cache for upcoming CREATE INDEX operation. We need stubs for the new indexes, so that we can keep track of changes to the table during online index creation. Also, crash recovery would drop any indexes that were incomplete at the time of the crash. handler::inplace_alter_table In InnoDB, this method is used for creating secondary indexes or for rebuilding the table. This is the ‘main’ phase that can be executed online (with concurrent writes to the table). handler::commit_inplace_alter_table This is where the operation is committed or rolled back. Here, InnoDB would drop any indexes, rename any columns, drop or add foreign keys, and finalize a table rebuild or index creation. It would also discard any logs that were set up for online index creation or table rebuild. The prepare and commit phases require an exclusive lock, blocking all access to the table. If MySQL times out while upgrading the table meta-data lock for the commit phase, it will roll back the ALTER TABLE operation. In MySQL 5.6, data definition language operations are still not fully atomic, because the data dictionary is split. Part of it is inside InnoDB data dictionary tables. Part of the information is only available in the *.frm file, which is not covered by any crash recovery log. But, there is a single commit phase inside the storage engine. Online Secondary Index Creation It may occur that an index needs to be created on a new column to speed up queries. But, it may be unacceptable to block modifications on the table while creating the index. It turns out that it is conceptually not so hard to support online index creation. All we need is some more execution phases: Set up a stub for the index, for logging changes. Scan the table for index records. Sort the index records. Bulk load the index records. Apply the logged changes. Replace the stub with the actual index. Threads that modify the table will log the operations to the logs of each index that is being created. Errors, such as log overflow or uniqueness violations, will only be flagged by the ALTER TABLE thread. The log is conceptually similar to the InnoDB change buffer. The bulk load of index records will bypass record locking. We still generate redo log for writing the index pages. It would suffice to log page allocations only, and to flush the index pages from the buffer pool to the file system upon completion. Native ALTER TABLE Starting with MySQL 5.6, InnoDB supports most ALTER TABLE operations natively. The notable exceptions are changes to the column type, ADD FOREIGN KEY except when foreign_key_checks=0, and changes to tables that contain FULLTEXT indexes. The keyword ALGORITHM=INPLACE is somewhat misleading, because certain operations cannot be performed in-place. For example, changing the ROW_FORMAT of a table requires a rebuild. Online operation (LOCK=NONE) is not allowed in the following cases: when adding an AUTO_INCREMENT column, when the table contains FULLTEXT indexes or a hidden FTS_DOC_ID column, or when there are FOREIGN KEY constraints referring to the table, with ON…CASCADE or ON…SET NULL option. The FOREIGN KEY limitations are needed, because MySQL does not acquire meta-data locks on the child or parent tables when executing SQL statements. Theoretically, InnoDB could support operations like ADD COLUMN and DROP COLUMN in-place, by lazily converting the table to a newer format. This would require that the data dictionary keep multiple versions of the table definition. For simplicity, we will copy the entire table, even for DROP COLUMN. The bulk copying of the table will bypass record locking and undo logging. For facilitating online operation, a temporary log will be associated with the clustered index of table. Threads that modify the table will also write the changes to the log. When altering the table, we skip all records that have been marked for deletion. In this way, we can simply discard any undo log records that were not yet purged from the original table. Off-page columns, or BLOBs, are an important consideration. We suspend the purge of delete-marked records if it would free any off-page columns from the old table. This is because the BLOBs can be needed when applying changes from the log. We have special logging for handling the ROLLBACK of an INSERT that inserted new off-page columns. This is because the columns will be freed at rollback.

Read the article
Organization & Architecture UNISA Studies – Chap 4

- by MarkPearl

Learning Outcomes Explain the characteristics of memory systems Describe the memory hierarchy Discuss cache memory principles Discuss issues relevant to cache design Describe the cache organization of the Pentium Computer Memory Systems There are key characteristics of memory… Location – internal or external Capacity – expressed in terms of bytes Unit of Transfer – the number of bits read out of or written into memory at a time Access Method – sequential, direct, random or associative From a users perspective the two most important characteristics of memory are… Capacity Performance – access time, memory cycle time, transfer rate The trade off for memory happens along three axis… Faster access time, greater cost per bit Greater capacity, smaller cost per bit Greater capacity, slower access time This leads to people using a tiered approach in their use of memory As one goes down the hierarchy, the following occurs… Decreasing cost per bit Increasing capacity Increasing access time Decreasing frequency of access of the memory by the processor The use of two levels of memory to reduce average access time works in principle, but only if conditions 1 to 4 apply. A variety of technologies exist that allow us to accomplish this. Thus it is possible to organize data across the hierarchy such that the percentage of accesses to each successively lower level is substantially less than that of the level above. A portion of main memory can be used as a buffer to hold data temporarily that is to be read out to disk. This is sometimes referred to as a disk cache and improves performance in two ways… Disk writes are clustered. Instead of many small transfers of data, we have a few large transfers of data. This improves disk performance and minimizes processor involvement. Some data designed for write-out may be referenced by a program before the next dump to disk. In that case the data is retrieved rapidly from the software cache rather than slowly from disk. Cache Memory Principles Cache memory is substantially faster than main memory. A caching system works as follows.. When a processor attempts to read a word of memory, a check is made to see if this in in cache memory… If it is, the data is supplied, If it is not in the cache, a block of main memory, consisting of a fixed number of words is loaded to the cache. Because of the phenomenon of locality of references, when a block of data is fetched into the cache, it is likely that there will be future references to that same memory location or to other words in the block. Elements of Cache Design While there are a large number of cache implementations, there are a few basic design elements that serve to classify and differentiate cache architectures… Cache Addresses Cache Size Mapping Function Replacement Algorithm Write Policy Line Size Number of Caches Cache Addresses Almost all non-embedded processors support virtual memory. Virtual memory in essence allows a program to address memory from a logical point of view without needing to worry about the amount of physical memory available. When virtual addresses are used the designer may choose to place the cache between the MMU (memory management unit) and the processor or between the MMU and main memory. The disadvantage of virtual memory is that most virtual memory systems supply each application with the same virtual memory address space (each application sees virtual memory starting at memory address 0), which means the cache memory must be completely flushed with each application context switch or extra bits must be added to each line of the cache to identify which virtual address space the address refers to. Cache Size We would like the size of the cache to be small enough so that the overall average cost per bit is close to that of main memory alone and large enough so that the overall average access time is close to that of the cache alone. Also, larger caches are slightly slower than smaller ones. Mapping Function Because there are fewer cache lines than main memory blocks, an algorithm is needed for mapping main memory blocks into cache lines. The choice of mapping function dictates how the cache is organized. Three techniques can be used… Direct – simplest technique, maps each block of main memory into only one possible cache line Associative – Each main memory block to be loaded into any line of the cache Set Associative – exhibits the strengths of both the direct and associative approaches while reducing their disadvantages For detailed explanations of each approach – read the text book (page 148 – 154) Replacement Algorithm For associative and set associating mapping a replacement algorithm is needed to determine which of the existing blocks in the cache must be replaced by a new block. There are four common approaches… LRU (Least recently used) FIFO (First in first out) LFU (Least frequently used) Random selection Write Policy When a block resident in the cache is to be replaced, there are two cases to consider If no writes to that block have happened in the cache – discard it If a write has occurred, a process needs to be initiated where the changes in the cache are propagated back to the main memory. There are several approaches to achieve this including… Write Through – all writes to the cache are done to the main memory as well at the point of the change Write Back – when a block is replaced, all dirty bits are written back to main memory The problem is complicated when we have multiple caches, there are techniques to accommodate for this but I have not summarized them. Line Size When a block of data is retrieved and placed in the cache, not only the desired word but also some number of adjacent words are retrieved. As the block size increases from very small to larger sizes, the hit ratio will at first increase because of the principle of locality, which states that the data in the vicinity of a referenced word are likely to be referenced in the near future. As the block size increases, more useful data are brought into cache. The hit ratio will begin to decrease as the block becomes even bigger and the probability of using the newly fetched information becomes less than the probability of using the newly fetched information that has to be replaced. Two specific effects come into play… Larger blocks reduce the number of blocks that fit into a cache. Because each block fetch overwrites older cache contents, a small number of blocks results in data being overwritten shortly after they are fetched. As a block becomes larger, each additional word is farther from the requested word and therefore less likely to be needed in the near future. The relationship between block size and hit ratio is complex, and no set approach is judged to be the best in all circumstances. Pentium 4 and ARM cache organizations The processor core consists of four major components: Fetch/decode unit – fetches program instruction in order from the L2 cache, decodes these into a series of micro-operations, and stores the results in the L2 instruction cache Out-of-order execution logic – Schedules execution of the micro-operations subject to data dependencies and resource availability – thus micro-operations may be scheduled for execution in a different order than they were fetched from the instruction stream. As time permits, this unit schedules speculative execution of micro-operations that may be required in the future Execution units – These units execute micro-operations, fetching the required data from the L1 data cache and temporarily storing results in registers Memory subsystem – This unit includes the L2 and L3 caches and the system bus, which is used to access main memory when the L1 and L2 caches have a cache miss and to access the system I/O resources

Read the article
Fastest pathfinding for static node matrix

- by Sean Martin

I'm programming a route finding routine in VB.NET for an online game I play, and I'm searching for the fastest route finding algorithm for my map type. The game takes place in space, with thousands of solar systems connected by jump gates. The game devs have provided a DB dump containing a list of every system and the systems it can jump to. The map isn't quite a node tree, since some branches can jump to other branches - more of a matrix. What I need is a fast pathfinding algorithm. I have already implemented an A* routine and a Dijkstra's, both find the best path but are too slow for my purposes - a search that considers about 5000 nodes takes over 20 seconds to compute. A similar program on a website can do the same search in less than a second. This website claims to use D*, which I have looked into. That algorithm seems more appropriate for dynamic maps rather than one that does not change - unless I misunderstand it's premise. So is there something faster I can use for a map that is not your typical tile/polygon base? GBFS? Perhaps a DFS? Or have I likely got some problem with my A* - maybe poorly chosen heuristics or movement cost? Currently my movement cost is the length of the jump (the DB dump has solar system coordinates as well), and the heuristic is a quick euclidean calculation from the node to the goal. In case anyone has some optimizations for my A*, here is the routine that consumes about 60% of my processing time, according to my profiler. The coordinateData table contains a list of every system's coordinates, and neighborNode.distance is the distance of the jump. Private Function findDistance(ByVal startSystem As Integer, ByVal endSystem As Integer) As Integer 'hCount += 1 'If hCount Mod 0 = 0 Then 'Return hCache 'End If 'Initialize variables to be filled Dim x1, x2, y1, y2, z1, z2 As Integer 'LINQ queries for solar system data Dim systemFromData = From result In jumpDataDB.coordinateDatas Where result.systemId = startSystem Select result.x, result.y, result.z Dim systemToData = From result In jumpDataDB.coordinateDatas Where result.systemId = endSystem Select result.x, result.y, result.z 'LINQ execute 'Fill variables with solar system data for from and to system For Each solarSystem In systemFromData x1 = (solarSystem.x) y1 = (solarSystem.y) z1 = (solarSystem.z) Next For Each solarSystem In systemToData x2 = (solarSystem.x) y2 = (solarSystem.y) z2 = (solarSystem.z) Next Dim x3 = Math.Abs(x1 - x2) Dim y3 = Math.Abs(y1 - y2) Dim z3 = Math.Abs(z1 - z2) 'Calculate distance and round 'Dim distance = Math.Round(Math.Sqrt(Math.Abs((x1 - x2) ^ 2) + Math.Abs((y1 - y2) ^ 2) + Math.Abs((z1 - z2) ^ 2))) Dim distance = firstConstant * Math.Min(secondConstant * (x3 + y3 + z3), Math.Max(x3, Math.Max(y3, z3))) 'Dim distance = Math.Abs(x1 - x2) + Math.Abs(z1 - z2) + Math.Abs(y1 - y2) 'hCache = distance Return distance End Function And the main loop, the other 30% 'Begin search While openList.Count() != 0 'Set current system and move node to closed currentNode = lowestF() move(currentNode.id) For Each neighborNode In neighborNodes If Not onList(neighborNode.toSystem, 0) Then If Not onList(neighborNode.toSystem, 1) Then Dim newNode As New nodeData() newNode.id = neighborNode.toSystem newNode.parent = currentNode.id newNode.g = currentNode.g + neighborNode.distance newNode.h = findDistance(newNode.id, endSystem) newNode.f = newNode.g + newNode.h newNode.security = neighborNode.security openList.Add(newNode) shortOpenList(OLindex) = newNode.id OLindex += 1 Else Dim proposedG As Integer = currentNode.g + neighborNode.distance If proposedG < gValue(neighborNode.toSystem) Then changeParent(neighborNode.toSystem, currentNode.id, proposedG) End If End If End If Next 'Check to see if done If currentNode.id = endSystem Then Exit While End If End While If clarification is needed on my spaghetti code, I'll try to explain.

Read the article
F# and the rose-tinted reflection

- by CliveT

We're already seeing increasing use of many cores on client desktops. It is a change that has been long predicted. It is not just a change in architecture, but our notions of efficiency in a program. No longer can we focus on the asymptotic complexity of an algorithm by counting the steps that a single core processor would take to execute it. Instead we'll soon be more concerned about the scalability of the algorithm and how well we can increase the performance as we increase the number of cores. This may even lead us to throw away our most efficient algorithms, and switch to less efficient algorithms that scale better. We might even be willing to waste cycles in order to speculatively execute at the algorithm rather than the hardware level. State is the big headache in this parallel world. At the hardware level, main memory doesn't necessarily contain the definitive value corresponding to a particular address. An update to a location might still be held in a CPU's local cache and it might be some time before the value gets propagated. To get the latest value, and the notion of "latest" takes a lot of defining in this world of rapidly mutating state, the CPUs may well need to communicate to decide who has the definitive value of a particular address in order to avoid lost updates. At the user program level, this means programmers will need to lock objects before modifying them, or attempt to avoid the overhead of locking by understanding the memory models at a very deep level. I think it's this need to avoid statefulness that has led to the recent resurgence of interest in functional languages. In the 1980s, functional languages started getting traction when research was carried out into how programs in such languages could be auto-parallelised. Sadly, the impracticality of some of the languages, the overheads of communication during this parallel execution, and rapid improvements in compiler technology on stock hardware meant that the functional languages fell by the wayside. The one thing that these languages were good at was getting rid of implicit state, and this single idea seems like a solution to the problems we are going to face in the coming years. Whether these languages will catch on is hard to predict. The mindset for writing a program in a functional language is really very different from the way that object-oriented problem decomposition happens - one has to focus on the verbs instead of the nouns, which takes some getting used to. There are a number of hybrid functional/object languages that have been becoming more popular in recent times. These half-way houses make it easy to use functional ideas for some parts of the program while still allowing access to the underlying object-focused platform without a great deal of impedance mismatch. One example is F# running on the CLR which, in Visual Studio 2010, has because a first class member of the pack. Inside Visual Studio 2010, the tooling for F# has improved to the point where it is easy to set breakpoints and watch values change while debugging at the source level. In my opinion, it is the tooling support that will enable the widespread adoption of functional languages - without this support, people will put off any transition into the functional world for as long as they possibly can. Without tool support it will make it hard to learn these languages. One tool that doesn't currently support F# is Reflector. The idea of decompiling IL to a functional language is daunting, but F# is potentially so important I couldn't dismiss the idea. As I'm currently developing Reflector 6.5, I thought it wise to take four days just to see how far I could get in doing so, even if it achieved little more than to be clearer on how much was possible, and how long it might take. You can read what happened here, and of the insights it gave us on ways to improve the tool.

Read the article
Speeding up procedural texture generation

- by FalconNL

Recently I've begun working on a game that takes place in a procedurally generated solar system. After a bit of a learning curve (having neither worked with Scala, OpenGL 2 ES or Libgdx before), I have a basic tech demo going where you spin around a single procedurally textured planet: The problem I'm running into is the performance of the texture generation. A quick overview of what I'm doing: a planet is a cube that has been deformed to a sphere. To each side, a n x n (e.g. 256 x 256) texture is applied, which are bundled in one 8n x n texture that is sent to the fragment shader. The last two spaces are not used, they're only there to make sure the width is a power of 2. The texture is currently generated on the CPU, using the updated 2012 version of the simplex noise algorithm linked to in the paper 'Simplex noise demystified'. The scene I'm using to test the algorithm contains two spheres: the planet and the background. Both use a greyscale texture consisting of six octaves of 3D simplex noise, so for example if we choose 128x128 as the texture size there are 128 x 128 x 6 x 2 x 6 = about 1.2 million calls to the noise function. The closest you will get to the planet is about what's shown in the screenshot and since the game's target resolution is 1280x720 that means I'd prefer to use 512x512 textures. Combine that with the fact the actual textures will of course be more complicated than basic noise (There will be a day and night texture, blended in the fragment shader based on sunlight, and a specular mask. I need noise for continents, terrain color variation, clouds, city lights, etc.) and we're looking at something like 512 x 512 x 6 x 3 x 15 = 70 million noise calls for the planet alone. In the final game, there will be activities when traveling between planets, so a wait of 5 or 10 seconds, possibly 20, would be acceptable since I can calculate the texture in the background while traveling, though obviously the faster the better. Getting back to our test scene, performance on my PC isn't too terrible, though still too slow considering the final result is going to be about 60 times worse: 128x128 : 0.1s 256x256 : 0.4s 512x512 : 1.7s This is after I moved all performance-critical code to Java, since trying to do so in Scala was a lot worse. Running this on my phone (a Samsung Galaxy S3), however, produces a more problematic result: 128x128 : 2s 256x256 : 7s 512x512 : 29s Already far too long, and that's not even factoring in the fact that it'll be minutes instead of seconds in the final version. Clearly something needs to be done. Personally, I see a few potential avenues, though I'm not particularly keen on any of them yet: Don't precalculate the textures, but let the fragment shader calculate everything. Probably not feasible, because at one point I had the background as a fullscreen quad with a pixel shader and I got about 1 fps on my phone. Use the GPU to render the texture once, store it and use the stored texture from then on. Upside: might be faster than doing it on the CPU since the GPU is supposed to be faster at floating point calculations. Downside: effects that cannot (easily) be expressed as functions of simplex noise (e.g. gas planet vortices, moon craters, etc.) are a lot more difficult to code in GLSL than in Scala/Java. Calculate a large amount of noise textures and ship them with the application. I'd like to avoid this if at all possible. Lower the resolution. Buys me a 4x performance gain, which isn't really enough plus I lose a lot of quality. Find a faster noise algorithm. If anyone has one I'm all ears, but simplex is already supposed to be faster than perlin. Adopt a pixel art style, allowing for lower resolution textures and fewer noise octaves. While I originally envisioned the game in this style, I've come to prefer the realistic approach. I'm doing something wrong and the performance should already be one or two orders of magnitude better. If this is the case, please let me know. If anyone has any suggestions, tips, workarounds, or other comments regarding this problem I'd love to hear them.

Read the article
Skewed: a rotating camera in a simple CPU-based voxel raycaster/raytracer

- by voxelizr

TL;DR -- in my first simple software voxel raycaster, I cannot get camera rotations to work, seemingly correct matrices notwithstanding. The result is skewed: like a flat rendering, correctly rotated, however distorted and without depth. (While axis-aligned ie. unrotated, depth and parallax are as expected.) I'm trying to write a simple voxel raycaster as a learning exercise. This is purely CPU based for now until I figure out how things work exactly -- fow now, OpenGL is just (ab)used to blit the generated bitmap to the screen as often as possible. Now I have gotten to the point where a perspective-projection camera can move through the world and I can render (mostly, minus some artifacts that need investigation) perspective-correct 3-dimensional views of the "world", which is basically empty but contains a voxel cube of the Stanford Bunny. So I have a camera that I can move up and down, strafe left and right and "walk forward/backward" -- all axis-aligned so far, no camera rotations. Herein lies my problem. Screenshot #1: correct depth when the camera is still strictly axis-aligned, ie. un-rotated. Now I have for a few days been trying to get rotation to work. The basic logic and theory behind matrices and 3D rotations, in theory, is very clear to me. Yet I have only ever achieved a "2.5 rendering" when the camera rotates... fish-eyey, bit like in Google Streetview: even though I have a volumetric world representation, it seems --no matter what I try-- like I would first create a rendering from the "front view", then rotate that flat rendering according to camera rotation. Needless to say, I'm by now aware that rotating rays is not particularly necessary and error-prone. Still, in my most recent setup, with the most simplified raycast ray-position-and-direction algorithm possible, my rotation still produces the same fish-eyey flat-render-rotated style looks: Screenshot #2: camera "rotated to the right by 39 degrees" -- note how the blue-shaded left-hand side of the cube from screen #2 is not visible in this rotation, yet by now "it really should"! Now of course I'm aware of this: in a simple axis-aligned-no-rotation-setup like I had in the beginning, the ray simply traverses in small steps the positive z-direction, diverging to the left or right and top or bottom only depending on pixel position and projection matrix. As I "rotate the camera to the right or left" -- ie I rotate it around the Y-axis -- those very steps should be simply transformed by the proper rotation matrix, right? So for forward-traversal the Z-step gets a bit smaller the more the cam rotates, offset by an "increase" in the X-step. Yet for the pixel-position-based horizontal+vertical-divergence, increasing fractions of the x-step need to be "added" to the z-step. Somehow, none of my many matrices that I experimented with, nor my experiments with matrix-less hardcoded verbose sin/cos calculations really get this part right. Here's my basic per-ray pre-traversal algorithm -- syntax in Go, but take it as pseudocode: fx and fy: pixel positions x and y rayPos: vec3 for the ray starting position in world-space (calculated as below) rayDir: vec3 for the xyz-steps to be added to rayPos in each step during ray traversal rayStep: a temporary vec3 camPos: vec3 for the camera position in world space camRad: vec3 for camera rotation in radians pmat: typical perspective projection matrix The algorithm / pseudocode: // 1: rayPos is for now "this pixel, as a vector on the view plane in 3d, at The Origin" rayPos.X, rayPos.Y, rayPos.Z = ((fx / width) - 0.5), ((fy / height) - 0.5), 0 // 2: rotate around Y axis depending on cam rotation. No prob since view plane still at Origin 0,0,0 rayPos.MultMat(num.NewDmat4RotationY(camRad.Y)) // 3: a temp vec3. planeDist is -0.15 or some such -- fov-based dist of view plane from eye and also the non-normalized, "in axis-aligned world" traversal step size "forward into the screen" rayStep.X, rayStep.Y, rayStep.Z = 0, 0, planeDist // 4: rotate this too -- 0,zstep should become some meaningful xzstep,xzstep rayStep.MultMat(num.NewDmat4RotationY(CamRad.Y)) // set up direction vector from still-origin-based-ray-position-off-rotated-view-plane plus rotated-zstep-vector rayDir.X, rayDir.Y, rayDir.Z = -rayPos.X - me.rayStep.X, -rayPos.Y, rayPos.Z + rayStep.Z // perspective projection rayDir.Normalize() rayDir.MultMat(pmat) // before traversal, the ray starting position has to be transformed from origin-relative to campos-relative rayPos.Add(camPos) I'm skipping the traversal and sampling parts -- as per screens #1 through #3, those are "basically mostly correct" (though not pretty) -- when axis-aligned / unrotated.

Read the article
Clever memory usage through the years

- by Ben Emmett

A friend and I were recently talking about the really clever tricks people have used to get the most out of memory. I thought I’d share my favorites, and would love to hear yours too! Interleaving on drum memory Back in the ye olde days before I’d been born (we’re talking the 50s / 60s here), working memory commonly took the form of rotating magnetic drums. These would spin at a constant speed, and a fixed head would read from memory when the correct part of the drum passed it by, a bit like a primitive platter disk. Because each revolution took a few milliseconds, programmers took to manually arranging information non-sequentially on the drum, timing when an instruction or memory address would need to be accessed, then spacing information accordingly around the edge of the drum, thus reducing the access delay. Similar techniques were still used on hard disks and floppy disks into the 90s, but have become irrelevant with modern disk technologies. The Hashlife algorithm Conway’s Game of Life has attracted numerous implementations over the years, but Bill Gosper’s Hashlife algorithm is particularly impressive. Taking advantage of the repetitive nature of many cellular automata, it uses a quadtree structure to store the hashes of pieces of the overall grid. Over time there are fewer and fewer new structures which need to be evaluated, so it starts to run faster with larger grids, drastically outperforming other algorithms both in terms of speed and the size of grid which can be simulated. The actual amount of memory used is huge, but it’s used in a clever way, so makes the list . Elite’s procedural generation Ok, so this isn’t exactly a memory optimization – more a storage optimization – but it gets an honorable mention anyway. When writing Elite, David Braben and Ian Bell wanted to build a rich world which gamers could explore, but their 22K memory was something of a limitation (for comparison that’s about the size of my avatar picture at the top of this page). They procedurally generated all the characteristics of the 2048 planets in their virtual universe, including the names, which were stitched together using a lookup table of parts of names. In fact the original plans were for 2^52 planets, but it was decided that that was probably too many. Oh, and they did that all in assembly language. Other games of the time used similar techniques too – The Sentinel’s landscape generation algorithm being another example. Modern Garbage Collectors Garbage collection in managed languages like Java and .NET ensures that most of the time, developers stop needing to care about how they use and clean up memory as the garbage collector handles it automatically. Achieving this without killing performance is a near-miraculous feet of software engineering. Much like when learning chemistry, you find that every time you think you understand how the garbage collector works, it turns out to be a mere simplification; that there are yet more complexities and heuristics to help it run efficiently. Of course introducing memory problems is still possible (and there are tools like our memory profiler to help if that happens to you) but they’re much, much rarer. A cautionary note In the examples above, there were good and well understood reasons for the optimizations, but cunningly optimized code has usually had to trade away readability and maintainability to achieve its gains. Trying to optimize memory usage without being pretty confident that there’s actually a problem is doing it wrong. So what have I missed? Tell me about the ingenious (or stupid) tricks you’ve seen people use. Ben

Read the article
Observing flow control idle time in TCP

- by user12820842

Previously I described how to observe congestion control strategies during transmission, and here I talked about TCP's sliding window approach for handling flow control on the receive side. A neat trick would now be to put the pieces together and ask the following question - how often is TCP transmission blocked by congestion control (send-side flow control) versus a zero-sized send window (which is the receiver saying it cannot process any more data)? So in effect we are asking whether the size of the receive window of the peer or the congestion control strategy may be sub-optimal. The result of such a problem would be that we have TCP data that we could be transmitting but we are not, potentially effecting throughput. So flow control is in effect: when the congestion window is less than or equal to the amount of bytes outstanding on the connection. We can derive this from args[3]-tcps_snxt - args[3]-tcps_suna, i.e. the difference between the next sequence number to send and the lowest unacknowledged sequence number; and when the window in the TCP segment received is advertised as 0 We time from these events until we send new data (i.e. args[4]-tcp_seq = snxt value when window closes. Here's the script: #!/usr/sbin/dtrace -s #pragma D option quiet tcp:::send / (args[3]-tcps_snxt - args[3]-tcps_suna) = args[3]-tcps_cwnd / { cwndclosed[args[1]-cs_cid] = timestamp; cwndsnxt[args[1]-cs_cid] = args[3]-tcps_snxt; @numclosed["cwnd", args[2]-ip_daddr, args[4]-tcp_dport] = count(); } tcp:::send / cwndclosed[args[1]-cs_cid] && args[4]-tcp_seq = cwndsnxt[args[1]-cs_cid] / { @meantimeclosed["cwnd", args[2]-ip_daddr, args[4]-tcp_dport] = avg(timestamp - cwndclosed[args[1]-cs_cid]); @stddevtimeclosed["cwnd", args[2]-ip_daddr, args[4]-tcp_dport] = stddev(timestamp - cwndclosed[args[1]-cs_cid]); @numclosed["cwnd", args[2]-ip_daddr, args[4]-tcp_dport] = count(); cwndclosed[args[1]-cs_cid] = 0; cwndsnxt[args[1]-cs_cid] = 0; } tcp:::receive / args[4]-tcp_window == 0 && (args[4]-tcp_flags & (TH_SYN|TH_RST|TH_FIN)) == 0 / { swndclosed[args[1]-cs_cid] = timestamp; swndsnxt[args[1]-cs_cid] = args[3]-tcps_snxt; @numclosed["swnd", args[2]-ip_saddr, args[4]-tcp_dport] = count(); } tcp:::send / swndclosed[args[1]-cs_cid] && args[4]-tcp_seq = swndsnxt[args[1]-cs_cid] / { @meantimeclosed["swnd", args[2]-ip_daddr, args[4]-tcp_sport] = avg(timestamp - swndclosed[args[1]-cs_cid]); @stddevtimeclosed["swnd", args[2]-ip_daddr, args[4]-tcp_sport] = stddev(timestamp - swndclosed[args[1]-cs_cid]); swndclosed[args[1]-cs_cid] = 0; swndsnxt[args[1]-cs_cid] = 0; } END { printf("%-6s %-20s %-8s %-25s %-8s %-8s\n", "Window", "Remote host", "Port", "TCP Avg WndClosed(ns)", "StdDev", "Num"); printa("%-6s %-20s %-8d %@-25d %@-8d %@-8d\n", @meantimeclosed, @stddevtimeclosed, @numclosed); } So this script will show us whether the peer's receive window size is preventing flow ("swnd" events) or whether congestion control is limiting flow ("cwnd" events). As an example I traced on a server with a large file transfer in progress via a webserver and with an active ssh connection running "find / -depth -print". Here is the output: ^C Window Remote host Port TCP Avg WndClosed(ns) StdDev Num cwnd 10.175.96.92 80 86064329 77311705 125 cwnd 10.175.96.92 22 122068522 151039669 81 So we see in this case, the congestion window closes 125 times for port 80 connections and 81 times for ssh. The average time the window is closed is 0.086sec for port 80 and 0.12sec for port 22. So if you wish to change congestion control algorithm in Oracle Solaris 11, a useful step may be to see if congestion really is an issue on your network. Scripts like the one posted above can help assess this, but it's worth reiterating that if congestion control is occuring, that's not necessarily a problem that needs fixing. Recall that congestion control is about controlling flow to prevent large-scale drops, so looking at congestion events in isolation doesn't tell us the whole story. For example, are we seeing more congestion events with one control algorithm, but more drops/retransmission with another? As always, it's best to start with measures of throughput and latency before arriving at a specific hypothesis such as "my congestion control algorithm is sub-optimal".

Read the article
NDepend Evaluation: Part 3

- by Anthony Trudeau

NDepend is a Visual Studio add-in designed for intense code analysis with the goal of high code quality. NDepend uses a number of metrics and aggregates the data in pleasing static and active visual reports. My evaluation of NDepend will be broken up into several different parts. In the first part of the evaluation I looked at installing the add-in. And in the last part I went over my first impressions including an overview of the features. In this installment I provide a little more detail on a few of the features that I really like. Dependency Matrix The dependency matrix is one of the rich visual components provided with NDepend. At a glance it lets you know where you have coupling problems including cycles. It does this with number indicating the weight of the dependency and a color-coding that indicates the nature of the dependency. Green and blue cells are direct dependencies (with the difference being whether the relationship is from row-to-column or column-to-row). Black cells are the ones that you really want to know about. These indicate that you have a cycle. That is, type A refers to type B and type B also refers to Type A. But, that’s not the end of the story. A handy pop-up appears when you hover over the cell in question. It explains the color, the dependency, and provides several interesting links that will teach you more than you want to know about the dependency. You can double-click the problem cells to explode the dependency. That will show the dependencies on a method-by-method basis allowing you to more easily target and fix the problem. When you’re done you can click the back button on the toolbar. Dependency Graph The dependency graph is another component provided. It’s complementary to the dependency matrix, but it isn’t as easy to identify dependency issues using the window. On a positive note, it does provide more information than the matrix. My biggest issue with the dependency graph is determining what is shown. This was not readily obvious. I ended up using the navigation buttons to get an acceptable view. I would have liked to choose what I see. Once you see the types you want you can get a decent idea of coupling strength based on the width of the dependency lines. Double-arrowed lines are problematic and are shown in red. The size of the boxes will be related to the metric being displayed. This is controlled using the Box Size drop-down in the toolbar. Personally, I don’t find the size of the box to be helpful, so I change it to Constant Font. One nice thing about the display is that you can see the entire path of dependencies when you hover over a type. This is done by color-coding the dependencies and dependants. It would be nice if selecting the box for the type would lock the highlighting in place. I did find a perhaps unintended work-around to the color-coding. You can lock the color-coding in by hovering over the type, right-clicking, and then clicking on the canvas area to clear the pop-up menu. You can then do whatever with it including saving it to an image file with the color-coding. CQL NDepend uses a code query language (CQL) to work with your code just like it was a database. CQL cannot be confused with the robustness of T-SQL or even LINQ, but it represents an impressive attempt at providing an expressive way to enumerate and interrogate your code. There are two main windows you’ll use when working with CQL. The CQL Query Explorer allows you to define what queries (rules) are run as part of a report – I immediately unselected rules that I don’t want in my results. The CQL Query Edit window is where you can view or author your own rules. The explorer window is pretty self-explanatory, so I won’t mention it further other than to say that any queries you author will appear in the custom group. Authoring your own queries is really hard to screw-up. The Intellisense-like pop-ups tell you what you can do while making composition easy. I was able to create a query within two minutes of playing with the editor. My query warns if any types that are interfaces don’t start with an “I”. WARN IF Count > 0 IN SELECT TYPES WHERE IsInterface AND !NameLike “I” The results from the CQL Query Edit window are immediate. That fact makes it useful for ad hoc querying. It’s worth mentioning two things that could make the experience smoother. First, out of habit from using Visual Studio I expect to be able to scroll and press Tab to select an item in the list (like Intellisense). You have to press Enter when you scroll to the item you want. Second, the commands are case-sensitive. I don’t see a really good reason to enforce that. CQL has a lot of potential not just in enforcing code quality, but also enforcing architectural constraints that your enterprise has defined. Up Next My next update will be the final part of the evaluation. I will summarize my experience and provide my conclusions on the NDepend add-in. ** View Part 1 of the Evaluation ** ** View Part 2 of the Evaluation ** Disclaimer: Patrick Smacchia contacted me about reviewing NDepend. I received a free license in return for sharing my experiences and talking about the capabilities of the add-in on this site. There is no expectation of a positive review elicited from the author of NDepend.

Read the article
WMemoryProfiler is Released

- by Alois Kraus

What is it? WMemoryProfiler is a managed profiling Api to aid integration testing. This free library can get managed heap statistics and memory usage for your own process (remember testing) and other processes as well. The best thing is that it does work from .NET 2.0 up to .NET 4.5 in x86 and x64. To make it more interesting it can attach to any running .NET process. The reason why I do mention this is that commercial profilers do support this functionality only for their professional editions. An normally only since .NET 4.0 since the profiling API only since then does support attaching to a running process. This thing does differ in many aspects from “normal” profilers because while profiling yourself you can get all objects from all managed heaps back as an object array. If you ever wanted to change the state of an object which does only exist a method local in another thread you can get your hands on it now … Enough theory. Show me some code /// <summary> /// Show feature to not only get statisics out of a process but also the newly allocated /// instances since the last call to MarkCurrentObjects. /// GetNewObjects does return the newly allocated objects as object array /// </summary> static void InstanceTracking() { using (var dumper = new MemoryDumper()) // if you have problems use to see the debugger windows true,true)) { dumper.MarkCurrentObjects(); Allocate(); ILookup<Type, object> newObjects = dumper.GetNewObjects() .ToLookup( x => x.GetType() ); Console.WriteLine("New Strings:"); foreach (var newStr in newObjects[typeof(string)] ) { Console.WriteLine("Str: {0}", newStr); } } } … New Strings: Str: qqd Str: String data: Str: String data: 0 Str: String data: 1 … This is really hot stuff. Not only you can get heap statistics but you can directly examine the new objects and make queries upon them. When I do find more time I can reconstruct the object root graph from it from my own process. It this cool or what? You can also peek into the Finalization Queue to check if you did accidentally forget to dispose a whole bunch of objects … /// <summary> /// .NET 4.0 or above only. Get all finalizable objects which are ready for finalization and have no other object roots anymore. /// </summary> static void NotYetFinalizedObjects() { using (var dumper = new MemoryDumper()) { object[] finalizable = dumper.GetObjectsReadyForFinalization(); Console.WriteLine("Currently {0} objects of types {1} are ready for finalization. Consider disposing them before.", finalizable.Length, String.Join(",", finalizable.ToLookup( x=> x.GetType() ) .Select( x=> x.Key.Name)) ); } } How does it work? The W of WMemoryProfiler is a good hint. It does employ Windbg and SOS dll to do the heavy lifting and concentrates on an easy to use Api which does hide completely Windbg. If you do not want to see Windbg you will never see it. In my experience the most complex thing is actually to download Windbg from the Windows 8 Stanalone SDK. This is described in the Readme and the exception you are greeted with if it is missing in much greater detail. So I will not go into this here. What Next? Depending on the feedback I do get I can imagine some features which might be useful as well Calculate first order GC Roots from the actual object graph Identify global statics in Types in object graph Support read out of finalization queue of .NET 2.0 as well. Support Memory Dump analysis (again a feature only supported by commercial profilers in their professional editions if it is supported at all) Deserialize objects from a memory dump into a live process back (this would need some more investigation but it is doable) The last item needs some explanation. Why on earth would you want to do that? The basic idea is to store in your live process some logging/tracing data which can become quite big but since it is never written to it is very fast to generate. When your process crashes with a memory dump you could transfer this data structure back into a live viewer which can then nicely display your program state at the point it did crash. This is an advanced trouble shooting technique I have not seen anywhere yet but it could be quite useful. You can have here a look at the current feature list of WMemoryProfiler with some examples. How To Get Started? First I would download the released source package (it is tiny). And compile the complete project. Then you can compile the Example project (it has this name) and uncomment in the main method the scenario you want to check out. If you are greeted with an exception it is time to install the Windows 8 Standalone SDK which is described in great detail in the exception text. Thats it for the first round. I have seen something more limited in the Java world some years ago (now I cannot find the link anymore) but anyway. Now we have something much better.

Read the article
Modular Reduction of Polynomials in NTRUEncrypt

- by Neville

Hello everyone. I'm implementing the NTRUEncrypt algorithm, according to an NTRU tutorial, a polynomial f has an inverse g such that f*g=1 mod x, basically the polynomial multiplied by its inverse reduced modulo x gives 1. I get the concept but in an example they provide, a polynomial f = -1 + X + X^2 - X4 + X6 + X9 - X10 which we will represent as the array [-1,1,1,0,-1,0,1,0,0,1,-1] has an inverse g of [1,2,0,2,2,1,0,2,1,2,0], so that when we multiply them and reduce the result modulo 3 we get 1, however when I use the NTRU algorithm for multiplying and reducing them I get -2. Here is my algorithm for multiplying them written in Java: public static int[] PolMulFun(int a[],int b[],int c[],int N,int M) { for(int k=N-1;k>=0;k--) { c[k]=0; int j=k+1; for(int i=N-1;i>=0;i--) { if(j==N) { j=0; } if(a[i]!=0 && b[j]!=0) { c[k]=(c[k]+(a[i]*b[j]))%M; } j=j+1; } } return c; } It basicall taken in polynomial a and multiplies it b, resturns teh result in c, N specifies the degree of the polynomials+1, in teh example above N=11; and M is the reuction modulo, in teh exampel above 3. Why am I getting -2 and not 1?

Read the article
Progress Bar design patterns?

- by shoosh

The application I'm writing performs a length algorithm which usually takes a few minutes to finish. During this time I'd like to show the user a progress bar which indicates how much of the algorithm is done as precisely as possible. The algorithm is divided into several steps, each with its own typical timing. For instance- initialization (500 milli-sec) reading inputs (5 sec) step 1 (30 sec) step 2 (3 minutes) writing outputs (7 sec) shutting down (10 milli-sec) Each step can report its progress quite easily by setting the range its working on, say [0 to 150] and then reporting the value it completed in its main loop. What I currently have set up is a scheme of nested progress monitors which form a sort of implicit tree of progress reporting. All progress monitors inherit from an interface IProgressMonitor: class IProgressMonitor { public: void setRange(int from, int to) = 0; void setValue(int v) = 0; }; The root of the tree is the ProgressMonitor which is connected to the actual GUI interface: class GUIBarProgressMonitor : public IProgressMonitor { GUIBarProgressMonitor(ProgressBarWidget *); }; Any other node in the tree are monitors which take control of a piece of the parent progress: class SubProgressMonitor : public IProgressMonitor { SubProgressMonitor(IProgressMonitor *parent, int parentFrom, int parentLength) ... }; A SubProgressMonitor takes control of the range [parentFrom, parentFrom+parentLength] of its parent. With this scheme I am able to statically divide the top level progress according to the expected relative portion of each step in the global timing. Each step can then be further subdivided into pieces etc' The main disadvantage of this is that the division is static and it gets painful to make changes according to variables which are discovered at run time. So the question: are there any known design patterns for progress monitoring which solve this issue?

Read the article
Python MD5 Hash Faster Calculation

- by balgan

Hi everyone. I will try my best to explain my problem and my line of thought on how I think I can solve it. I use this code for root, dirs, files in os.walk(downloaddir): for infile in files: f = open(os.path.join(root,infile),'rb') filehash = hashlib.md5() while True: data = f.read(10240) if len(data) == 0: break filehash.update(data) print "FILENAME: " , infile print "FILE HASH: " , filehash.hexdigest() and using start = time.time() elapsed = time.time() - start I measure how long it takes to calculate an hash. Pointing my code to a file with 653megs this is the result: root@Mars:/home/tiago# python algorithm-timer.py FILENAME: freebsd.iso FILE HASH: ace0afedfa7c6e0ad12c77b6652b02ab 12.624 root@Mars:/home/tiago# python algorithm-timer.py FILENAME: freebsd.iso FILE HASH: ace0afedfa7c6e0ad12c77b6652b02ab 12.373 root@Mars:/home/tiago# python algorithm-timer.py FILENAME: freebsd.iso FILE HASH: ace0afedfa7c6e0ad12c77b6652b02ab 12.540 Ok now 12 seconds +- on a 653mb file, my problem is I intend to use this code on a program that will run through multiple files, some of them might be 4/5/6Gb and it will take wayy longer to calculate. What am wondering is if there is a faster way for me to calculate the hash of the file? Maybe by doing some multithreading? I used a another script to check the use of the CPU second by second and I see that my code is only using 1 out of my 2 CPUs and only at 25% max, any way I can change this? Thank you all in advance for the given help.

Read the article
Updating a Minimum spanning tree when a new edge is inserted

- by Lynette

Hello, I've been presented the following problem in University: Let G = (V, E) be an (undirected) graph with costs ce = 0 on the edges e € E. Assume you are given a minimum-cost spanning tree T in G. Now assume that a new edge is added to G, connecting two nodes v, tv € V with cost c. a) Give an efficient algorithm to test if T remains the minimum-cost spanning tree with the new edge added to G (but not to the tree T). Make your algorithm run in time O(|E|). Can you do it in O(|V|) time? Please note any assumptions you make about what data structure is used to represent the tree T and the graph G. b)Suppose T is no longer the minimum-cost spanning tree. Give a linear-time algorithm (time O(|E|)) to update the tree T to the new minimum-cost spanning tree. This is the solution I found: Let e1=(a,b) the new edge added Find in T the shortest path from a to b (BFS) if e1 is the most expensive edge in the cycle then T remains the MST else T is not the MST It seems to work but i can easily make this run in O(|V|) time, while the problem asks O(|E|) time. Am i missing something? By the way we are authorized to ask for help from anyone so I'm not cheating :D Thanks in advance

Read the article
Boost lambda: Invoke method on object

- by ckarras

I'm looking at boost::lambda as a way to to make a generic algorithm that can work with any "getter" method of any class. The algorithm is used to detect duplicate values of a property, and I would like for it to work for any property of any class. In C#, I would do something like this: class Dummy { public String GetId() ... public String GetName() ... } IEnumerable<String> FindNonUniqueValues<ClassT> (Func<ClassT,String> propertyGetter) { ... } Example use of the method: var duplicateIds = FindNonUniqueValues<Dummy>(d => d.GetId()); var duplicateNames = FindNonUniqueValues<Dummy>(d => d.GetName()); I can get the for "any class" part to work, using either interfaces or template methods, but have not found yet how to make the "for any method" part work. Is there a way to do something similar to the "d = d.GetId()" lambda in C++ (either with or without Boost)? Alternative, more C++ian solutions to make the algorithm generic are welcome too. I'm using C++/CLI with VS2008, so I can't use C++0x lambdas.

Read the article
Finding the left-most and right-most points of a list. std::find_if the right way to go?

- by Tom

Hi, I have a list of Point objects, (each one with x,y properties) and would like to find the left-most and right-most points. I've been trying to do it with find_if, but i'm not sure its the way to go, because i can't seem to pass a comparator instance. Is find_if the way to go? Seems not. So, is there an algorithm in <algorithm> to achieve this? Thanks in advance. #include <iostream> #include <list> #include <algorithm> using namespace std; typedef struct Point{ float x; float y; } Point; bool left(Point& p1,Point& p2) { return p1.x < p2.x; } int main(){ Point p1 ={-1,0}; Point p2 ={1,0}; Point p3 ={5,0}; Point p4 ={7,0}; list <Point> points; points.push_back(p1); points.push_back(p2); points.push_back(p3); points.push_back(p4); //Should return an interator to p1. find_if(points.begin(),points.end(),left); return 0; }

Read the article
Deterministic and non uniform long string generation from seed

- by Limonup

I had this weird idea for an encryption that I wanted to try out, it may be bad, and it may have done before, but I'm just doing it for fun. The short version of the question is: Is it possible to generate a long, deterministic and non-uniformly distributed string/sequence of numbers from a small seed? Long(er) version: I was thinking to encrypt a text by changing encoding. The new encoding would be generated via Huffman algorithm. To work well, the Huffman algorithm would need a fairly long text with non uniform distribution. Then characters can have different bit-lengths which would be the primary strength of this encryption. The problem is that its impractical to enter in/remember a long text each time you want to decrypt the text. So I was wondering if it was possible to generate a text from password seed? It doesn't matter what the text is, as long as it has non uniform distribution of characters and that the exact same sequence can be recreated each time you give it the same seed. Preferably, are there any functions/extensions in Python that can do this? EDIT: To expand on the "strength" of varying bit length: if I have a string "test", ASCII values 116, 101, 115, 116, which gives bit values of 1110100 1100101 1110011 1110100 Then, say my Huffman algorithm generates encoding like t = 101 e = 1100111 s = 10001 The final string is 101 1100111 10001 101, if we encode this back to ASCII, we get 1011100 1111000 1101000, which is 3 entirely different characters. Obviously its impossible to perform any kind of frequency analysis or something like that on this.

Read the article
Password hashing, salt and storage of hashed values

- by Jonathan Leffler

Suppose you were at liberty to decide how hashed passwords were to be stored in a DBMS. Are there obvious weaknesses in a scheme like this one? To create the hash value stored in the DBMS, take: A value that is unique to the DBMS server instance as part of the salt, And the username as a second part of the salt, And create the concatenation of the salt with the actual password, And hash the whole string using the SHA-256 algorithm, And store the result in the DBMS. This would mean that anyone wanting to come up with a collision should have to do the work separately for each user name and each DBMS server instance separately. I'd plan to keep the actual hash mechanism somewhat flexible to allow for the use of the new NIST standard hash algorithm (SHA-3) that is still being worked on. The 'value that is unique to the DBMS server instance' need not be secret - though it wouldn't be divulged casually. The intention is to ensure that if someone uses the same password in different DBMS server instances, the recorded hashes would be different. Likewise, the user name would not be secret - just the password proper. Would there be any advantage to having the password first and the user name and 'unique value' second, or any other permutation of the three sources of data? Or what about interleaving the strings? Do I need to add (and record) a random salt value (per password) as well as the information above? (Advantage: the user can re-use a password and still, probably, get a different hash recorded in the database. Disadvantage: the salt has to be recorded. I suspect the advantage considerably outweighs the disadvantage.) There are quite a lot of related SO questions - this list is unlikely to be comprehensive: Encrypting/Hashing plain text passwords in database Secure hash and salt for PHP passwords The necessity of hiding the salt for a hash Clients-side MD5 hash with time salt Simple password encryption Salt generation and Open Source software I think that the answers to these questions support my algorithm (though if you simply use a random salt, then the 'unique value per server' and username components are less important).

Read the article
Calculating distance from latitude, longitude and height using a geocentric co-ordinate system

- by Sarge

I've implemented this method in Javascript and I'm roughly 2.5% out and I'd like to understand why. My input data is an array of points represented as latitude, longitude and the height above the WGS84 ellipsoid. These points are taken from data collected from a wrist-mounted GPS device during a marathon race. My algorithm was to convert each point to cartesian geocentric co-ordinates and then compute the Euclidean distance (c.f Pythagoras). Cartesian geocentric is also known as Earth Centred Earth Fixed. i.e. it's an X, Y, Z co-ordinate system which rotates with the earth. My test data was the data from a marathon and so the distance should be very close to 42.26km. However, the distance comes to about 43.4km. I've tried various approaches and nothing changes the result by more than a metre. e.g. I replaced the height data with data from the NASA SRTM mission, I've set the height to zero, etc. Using Google, I found two points in the literature where lat, lon, height had been transformed and my transformation algorithm is matching. What could explain this? Am I expecting too much from Javascript's double representation? (The X, Y, Z numbers are very big but the differences between two points is very small). My alternative is to move to computing the geodesic across the WGS84 ellipsoid using Vincenty's algorithm (or similar) and then calculating the Euclidean distance with the two heights but this seems inaccurate. Thanks in advance for your help!

Read the article
Which design pattern is most appropriate?

- by Anon

Hello, I want to create a class that can use one of four algorithms (and the algorithm to use is only known at run-time). I was thinking that the Strategy design pattern sounds appropriate, but my problem is that each algorithm requires slightly different parameters. Would it be a bad design to use strategy, but pass in the relevant parameters into the constructor?. Here is an example (for simplicity, let's say there are only two possible algorithms) ... class Foo { private: // At run-time the correct algorithm is used, e.g. a = new Algorithm1(1); AlgorithmInterface* a; }; class AlgorithmInterface { public: virtual void DoSomething = 0; }; class Algorithm1 : public AlgorithmInterface { public: Algorithm1( int i ) : value(i) {} virtual void DoSomething(){ // Does something with int value }; int value; }; class Algorithm2 : public AlgorithmInterface { public: Algorithm2( bool b ) : value(b) {} virtual void DoSomething(){ // Do something with bool value }; bool value; };

Read the article
CakePHP: Interaction between different files/classes

- by Alexx Hardt

Hey, I'm cloning a commercial student management system. Students use the frontend to apply for lectures, uni staff can modify events (time, room, etc). The core of the app will be the algortihm which distributes the seats to students. I already asked about it here: How to implement a seat distribution algorithm for uni lectures Now, I found a class for that algorithm here: http://www.phpclasses.org/browse/file/10779.html I put the 'class GA' into app/vendors. I need to write a 'class Solution', which represents one object (a child, and later a parent for the evolutionary process). I'll also have to write functions mutate(), crossover() and fitness(). fitness calculates a score of a solution, based on if there are overbooked courses etc; crossover() is the crazy monkey sex function which produces a child from two parents, and mutate() modifies a child after crossover. Now, the fitness()-function needs to access a few related models, and their find()-functions. It evaluates a solution's fitness by checking e.g. if there are overbooked courses, or unfulfilled wishes, and penalizes that. Where would I put the ga.php, solution.php and the three functions? ga.php has to access the functions, but the functions have to access the models. I also don't want to call any App::import()'s from within the fitness()-function, because it gets called many thousand times when the algorithm runs. Hope someone can help me. Thanks in advance =)

Read the article
Is memory allocation in linux non-blocking?

- by Mark

I am curious to know if the allocating memory using a default new operator is a non-blocking operation. e.g. struct Node { int a,b; }; ... Node foo = new Node(); If multiple threads tried to create a new Node and if one of them was suspended by the OS in the middle of allocation, would it block other threads from making progress? The reason why I ask is because I had a concurrent data structure that created new nodes. I then modified the algorithm to recycle the nodes. The throughput performance of the two algorithms was virtually identical on a 24 core machine. However, I then created an interference program that ran on all the system cores in order to create as much OS pre-emption as possible. The throughput performance of the algorithm that created new nodes decreased by a factor of 5 relative the the algorithm that recycled nodes. I'm curious to know why this would occur. Thanks. *Edit : pointing me to the code for the c++ memory allocator for linux would be helpful as well. I tried looking before posting this question, but had trouble finding it.

Read the article
Largest triangle from a set of points

- by Faken

I have a set of random points from which i want to find the largest triangle by area who's verticies are each on one of those points. So far I have figured out that the largest triangle's verticies will only lie on the outside points of the cloud of points (or the convex hull) so i have programmed a function to do just that (using Graham scan in nlogn time). However that's where I'm stuck. The only way I can figure out how to find the largest triangle from these points is to use brute force at n^3 time which is still acceptable in an average case as the convex hull algorithm usually kicks out the vast majority of points. However in a worst case scenario where points are on a circle, this method would fail miserably. Dose anyone know an algorithm to do this more efficiently? Note: I know that CGAL has this algorithm there but they do not go into any details on how its done. I don't want to use libraries, i want to learn this and program it myself (and also allow me to tweak it to exactly the way i want it to operate, just like the graham scan in which other implementations pick up collinear points that i don't want).

Read the article
NHibernate correct way to reattach cached entity to different session

- by Chris Marisic

I'm using NHibernate to query a list of objects from my database. After I get the list of objects over them I iterate over the list of objects and apply a distance approximation algorithm to find the nearest object. I consider this function of getting the list of objects and apply the algorithm over them to be a heavy operation so I cache the object which I find from the algorithm in HttpRuntime.Cache. After this point whenever I'm given the supplied input again I can just directly pull the object from Cache instead of having to hit the database and traverse the list. My object is a complex object that has collections attached to it, inside the query where I return the full list of objects I don't bring back any of the sub collections eagerly so when I read my cached object I need lazy loading to work correctly to be able to display the object fully. Originally I tried using this to re-associate my cached object back to a new session _session.Lock(obj, LockMode.None); However when accessing the page concurrently from another instance I get the error Illegal attempt to associate a collection with two open sessions I then tried something different with _session.Merge(obj); However watching the output of this in NHProf shows that it is deleting and re-associating my object's contained collections with my object, which is not what I want although it seems to work fine. What is the correct way to do this? Neither of these seem to be right.

Read the article

Search Results

Search found 7220 results on 289 pages for 'graph algorithm'.

Page 155/289 | < Previous Page | 151 152 153 154 155 156 157 158 159 160 161 162 | Next Page >

- by Az

- by Marko Mäkelä

- by MarkPearl

- by Sean Martin

- by CliveT

- by FalconNL

- by voxelizr

- by Ben Emmett

- by user12820842

- by Anthony Trudeau

- by Alois Kraus

- by Neville

- by shoosh

- by balgan

- by Lynette

- by ckarras

- by Tom

- by Limonup

- by Jonathan Leffler

- by Sarge

- by Anon

- by Alexx Hardt

- by Mark

- by Faken

- by Chris Marisic

< Previous Page | 151 152 153 154 155 156 157 158 159 160 161 162 | Next Page >