per pixel - Page 58 - Developer IT

Convert png sequence to x264 with ffmpeg

- by Thucydides411

I am trying to convert a series of pngs into an mp4 video. I am using ffmpeg, and want to encode the video with the x264 codec. Using the command ffmpeg -y -r 30 -b 1800k -i _tmp%04d.png -vcodec libx264 out.mp4 I get the following warning message Incompatible pixel format 'bgra' for codec 'libx264', auto-selecting format 'yuv420p' My understanding is that there is an alpha channel in the pngs, which the x264 encoder cannot handle. Is there a way to get around this problem? Is there, for example, a way to get the encoder to ignore the alpha channel (my pngs don't actually have any transparent elements)? I'm aware that I could batch convert the pngs beforehand to strip the alpha channel, but the sequence of images is produced by another program, and having to preprocess the images each time I make a video would be less than optimal. Edit: After stripping the alpha channel from each frame using the command convert in.png -background white -flatten +matte out.png ffmpeg gives the warning message Incompatible pixel format 'pal8' for codec 'libx264', auto-selecting format 'yuv420p' so still no dice.

Read the article

Mail Enabled Sharepoint 2010 Loop Detected

- by vlannoob

I have setup a small Sharepoint 2010 deployment and it is working fine, for now. I have run through one of the more popular step by step guides to mail enable the install and what I have is internal and external mail going to my mail enabled list hitting my Exchange 2010 server (on another Win2k8R2 box) and sitting in the submissions queue with a Loop Detected error and they progres no further. Everything appears OK as per the guide. I have setup an SMTP role on the Sharepoint box, as per the guide. I have setup a new Send Conenctor on the Exchange 2010 server, as per the guide. Any ideas on troubleshooting here?

Read the article

ffmpeg open webcam using YUYV but i want MJPEG

- by Pavel

I need ffmpeg to open webcam (logitech c910) in MJPEG mode, because the webcam can give ~24 using MJPEG "protocol" and only ~10 fps using the YUYV. Can i choose between them using ffmpeg command line? xx@(none) ~ $ v4l2-ctl --list-formats ioctl: VIDIOC_ENUM_FMT Index : 0 Type : Video Capture Pixel Format: 'YUYV' Name : YUV 4:2:2 (YUYV) Index : 1 Type : Video Capture Pixel Format: 'MJPG' (compressed) Name : MJPEG My current command line: ffmpeg -y -f alsa -i hw:3,0 -f video4linux2 -r 20 -s 1280x720 -i /dev/video0 -acodec libfaac -ab 128k -vcodec libx264 /tmp/web.avi ffmpeg produces corrupted h264 stream when i record from webcam, but normal h264 strem when i record from x11grab. Another codecs (mjpeg, mpeg4) works well with webcam... But this is another story.

Read the article

Good default for XDG_RUNTIME_DIR?

- by cadrian

The XDG Base Directory Specification is a very interesting spec for user directories. It also provides good default values, except for XDG_RUNTIME_DIR. Now I am writing a software that needs to create named pipes. It is a per-user client-server framework (there is a FIFO for the server and a FIFO per client). If XDG_RUNTIME_DIR is not defined, I am currently using a per-user subdirectory in /tmp — but it does not ensure all the specified conditions (viz. the paragraph starting with "The lifetime of the directory MUST be bound to the user being logged in…") Is /tmp/myserver-$USER good enough? Edit I saw elsewhere a few suggestions: . is quite unsatisfactory (at least because it is not an absolute path). I also saw /var/run/user/$USER — not bad, but that directory does not exist (at least on my box running a Debian testing)

Read the article

Media Information for Constant and Variable bit rate of Video files

- by cpx

What is this Maximum bit rate for a .mp4 format file whose bit rate mode is Constant? Media information displayed for MP4 (Using MediaInfo Tool) ID : 1 Format : AVC Format/Info : Advanced Video Codec Format profile : [email protected] Format settings, CABAC : No Format settings, ReFrames : 1 frame Codec ID : avc1 Codec ID/Info : Advanced Video Coding Bit rate mode : Constant Bit rate : 1 500 Kbps Maximum bit rate : 3 961 Kbps Display aspect ratio : 4:3 Frame rate mode : Constant Frame rate : 29.970 fps Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 8 bits Scan type : Progressive Bits/(Pixel*Frame) : 0.163 In this case where the bit rate mode is set to Variable, is the Bit rate field where the value is displayed as 309 is its average bit rate? Media information displayed for M4V (Using MediaInfo Tool) ID : 1 Format : AVC Format/Info : Advanced Video Codec Format profile : [email protected] Format settings, CABAC : No Format settings, ReFrames : 1 frame Codec ID : avc1 Codec ID/Info : Advanced Video Coding Bit rate mode : Variable Bit rate : 309 Kbps Display aspect ratio : 16:9 Frame rate mode : Variable Frame rate : 23.976 fps Minimum frame rate : 23.810 fps Maximum frame rate : 24.390 fps Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 8 bits Scan type : Progressive Bits/(Pixel*Frame) : 0.229 Writing library : x264 core 120

Read the article

AS2 Server Software Costs

- by CandyCo

We're currently using Cleo LexiCom as our server software for receiving EDI transmissions via the AS2 protocol. We have 7 trading partners per year, and this runs us about $800/year for support from Cleo. We need to expand from 7 trading partners to 10 or so, and Cleo charges roughly $600 per new host, plus an expanded yearly support fee. My question(s) are: Does anyone know of a cheaper developer of AS2 server software, and perhaps one that doesn't charge per new host? Does anyone have any clue why we are being charged an upfront fee for new hosts, and if this is a standard practice for AS2 software providers? It seems really odd that we are required to pay upfront costs for this. I could completely understand an increase in the yearly support, however.

Read the article

NetFlow Storage Calculator

- by javano

I am planning to deploy a NetFlow server (using NfSen/NfDump) for harvesting data from Cisco devices; Are there standard calculations or guidelines I can use to calculate my server requirements, specifically I need to plan for storage. Is there a way of knowing how much data I will collect per day for example, given N flows? Lets say one device has 10k flows per day, this is typically XYZ MBs, so I can scale this up? If not, how many flows are you guys and girls recording per day, and how much data is this generating? Hopefully we can generate an estimate from everyone else's figures! P.S. If it makes a difference, I'll be collecting from <= 50 devices max (non more than 50Mbps each).

Read the article

Extracting the layer transparency into an editable layer mask in Photoshop

- by last-child

Is there any simple way to extract the "baked in" transparency in a layer and turn it into a layer mask in Photoshop? To take a simple example: Let's say that I paint a few strokes with a semi-transparent brush, or paste in a .png-file with an alpha channel. The rgb color values and the alpha value for each pixel are now all contained in the layer-image itself. I would like to be able to edit the alpha values as a layer mask, so that the layer image is solid and contains only the RGB values for each pixel. Is this possible, and in that case how? Thanks. EDIT: To clarify - I'm not really after the transparency values in themselves, but in the separation of rgb values and alpha values. That means that the layer must become a solid, opaque image with a mask.

Read the article

Recommended service account setup for MS SQL Server 2005/2008

- by boxerbucks

We have a number of MS SQL servers in our environment running either SQL Server 2005 standard/enterprise or SQL server 2008 enterprise. Currently the SQL services are running as local service or network service and the MS recommended best practice is to run as a domain account which is what we are trying to move towards. Is the best practice with regards to domain accounts to have a separate domain account per service per server? So if we have 4 SQL services we want to run per server and we have 50 servers, we would create 50 * 4 = 200 accounts in AD? This seems excessive to me and I was wondering if anyone has any real experience with this type of setup and it's management.

Read the article

High-res icon in Windows Vista alt-tab thumbnail preview?

- by netvope

I have customized my alt-tab screen with the following: [HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\AltTab] "OverlayIconPx"=dword:00000040 "MaxThumbSizePx"=dword:00000100 "MinThumbSizePcent"=dword:00000064 It works great: the thumbnail becomes 256 pixel wide and the icon at the corner of the thumbnail becomes 64x64 pixels. However, Windows doesn't load the high-res icons from the programs; instead, it uses the 16x16 pixel icon and scaled it up by nearest-neighbor. I'm sure the programs has high-res icons because I saw them with in "Extra Large Icon" view in Explorer. So the question is: How can I force Windows to load the high-res icons for the alt-tab thumbnail preview? (Perhaps a registry key, or a .dll hack/injection?)

Read the article

Auto log off windows machine after certain number of minutes

- by Marty Heath

I have three machines on my network, two are windows xp and one is windows 7. i would like to have all three machines log a user off if they are on for more than 60 minutes. And I would like this to be applied to the machine not on a per user basis, because I do not want this policy to apply to those users on any other machine. I have installed winexit.scr on one of the machines but the problem is that I cannot change the default value of 10 minutes for the screensaver because that is controlled through group policy, and I cannot seem to find where to change that through group policy on a per machine basis NOT on a per user basis. If I have left out any details I apologize please let me know anything that is needed

Read the article

Why is USB-sticks so much slower than Solid State Drives?

- by Jonas

From what I understand, USB flash memory and Solid State Drives are based on similar technologies, NAND flash memory. But USB-sticks is usually quite slow with a read and write speed of 5-10MB per second while Solid State Drives usually is very fast, usually 100-570MB per second. Why are Solid State Drives so much faster than USB-sticks? And why isn't USB-sticks faster than 5-10MB per second? Is it simply that SSD-drives uses parallel access to the NAND flash memory or are there other reasons?

Read the article

How to put 1000 lightweight server applications in the cloud

- by Dan Bird

The company I work for sells a commercial desktop/server app that runs on any non dedicated Windows PC or server and uses Tomcat for all interactions with the application. Customers are asking that we host their instance of the application so they don't have to run it locally on their own servers. The app is lightweight and an average server, in theory, could handle 25-50 instances before users would notice a slowdown. However only 1 instance can run per Windows instance (because the application writes to a common registry branch) so we'd need something like VMWare to create 25-50 Windows instances. We know we eventually need to reprogram to make it truly cloud-worthy but what would you recommend for a server farm or whatever for this? We don't have the setup to purchase our own servers so we must use a 3rd party. We have budgeted $500 - $1000 per year per customer for this service. Thanks in advance for your suggestions, experiences and guidance.

Read the article

How do I analyze an Apache Bench result?

- by Alan Hoffmeister

I need some help with analyzing a log from Apache Bench: Benchmarking texteli.com (be patient) Completed 100 requests Completed 200 requests Completed 300 requests Completed 400 requests Completed 500 requests Completed 600 requests Completed 700 requests Completed 800 requests Completed 900 requests Completed 1000 requests Finished 1000 requests Server Software: Server Hostname: texteli.com Server Port: 80 Document Path: /4f84b59c557eb79321000dfa Document Length: 13400 bytes Concurrency Level: 200 Time taken for tests: 37.030 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 13524000 bytes HTML transferred: 13400000 bytes Requests per second: 27.01 [#/sec] (mean) Time per request: 7406.024 [ms] (mean) Time per request: 37.030 [ms] (mean, across all concurrent requests) Transfer rate: 356.66 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 27 37 19.5 34 319 Processing: 80 6273 1673.7 6907 8987 Waiting: 47 3436 2085.2 3345 8856 Total: 115 6310 1675.8 6940 9022 Percentage of the requests served within a certain time (ms) 50% 6940 66% 6968 75% 6988 80% 7007 90% 7025 95% 7078 98% 8410 99% 8876 100% 9022 (longest request) What this results can tell me? Isn't 27 rps too slow?

Read the article

apache - subdomains are slower?

- by matthewsteiner

Using apache benchmark, I ran the exact same php application at my root domain and a subdomain. Even with multiple tests and high request numbers, the requests per second perform extremely differently. I mean something like this: example.com - 1200 requests per second bench.example.com - 50 requests per second What could be affecting this? These aren't using databases or anything, just mainly getting a simple page displaying. But it's the same app for both of them, and I'm wondering why they perform so differently. Ideas?

Read the article

Can I make Firefox ignore/interpret font sizes specified in pixels?

- by Andy

Hi all, I have an 11.1" notebook display with 1366x768 resolution, which gives it a DPI of 141. I'm running GNOME and have configured the DPI. Everything works OK except web browsing - far too many websites specify their font sizes in pixels, which ends up with very small text on a high DPI display. My ideal solution would be for Firefox to interpret an absolute pixel size in terms of normal DPI and display it appropriately for my DPI (eg scale it by 141/96). Obviously this would cause problems on the occasion where graphics had been pixel-aligned with fonts in some way, but I imagine that would cause me far less of a headache than either reading minute text, or scaling the text manually each time. Any suggestions? TIA, Andy

Read the article

High Lock Wait ratio in MySQL

- by FunkyChicken

on my site I log every pageview (date,ip,referrer,page,etc) in a simple mysql table. This table gets very little selects (3 per minute), but a lot of inserts. (about 100 per second) Today I changed this table from an InnoDB table to a MEMORY table, this made sense to me to prevent unnecessary hard disk IO. I also prune this table once per minute, to make sure it never get's too big. -- Performance wise, things are running fine. But I noticed that while running tuning-primer, that my Current Lock Wait ratio is quite high. Current Lock Wait ratio = 1 : 561 My question: Should I worry about this Lock Wait Ratio? And is there something I can change in my my.cnf to improve things so that the lock wait ratio isn't so high?

Read the article

Do I use the FV function in Excel correctly?

- by John

My task: Create a table: Calculate what the revenues of e-trading will be after five years at 15 percent interest rate if we now have 15 000 EUR. Use the FV function from the Financial Group in Excel. My resolution: =FV( 15%; 5; 0; -15000). My question: Is it correct? I know the task lacks information whether the interest rate is per month or per year. I calculate it as 'per year'. My question is orientated more on the usage of the FV function. I, for example, do not understand why '-15000' and not '15000'. Also why the third parameter has to be 0? Maybe I do it wrong. Please help me solve it! Thanks in advance.

Read the article

Win 8 start screen resolution

- by Abhijith

My screen resolution is 1280x1024 running Win 8 RP I formatted my computer and reinstalled Win 8 CP because I had too many BSODs. When I installed Win 8 CP and created a local account. I had 5(or 6) tiles per column. But once I switched to the Microsoft account to get my synced wallpaper and lock-screen, the Start screen resolution changed and I got a max 3 tiles per column. The size of all metro apps including the Settings app changed and became awkwardly bigger. Is there a way to get back 5 tiles per column? Essentially changing the resolution of the start screen?

Read the article

AS2 Server Software Costs

- by CandyCo

We're currently using Cleo LexiCom as our server software for receiving EDI transmissions via the AS2 protocol. We have 7 trading partners per year, and this runs us about $800/year for support from Cleo. We need to expand from 7 trading partners to 10 or so, and Cleo charges roughly $600 per new host, plus an expanded yearly support fee. My question(s) are: Does anyone know of a cheaper developer of AS2 server software, and perhaps one that doesn't charge per new host? Does anyone have any clue why we are being charged an upfront fee for new hosts, and if this is a standard practice for AS2 software providers? It seems really odd that we are required to pay upfront costs for this. I could completely understand an increase in the yearly support, however.

Read the article

HAProxy overload protection

- by user2050516

using the HAProxy, would it be possible to configure an overload protection, to limit the amount of requests sent to the backing http server(s) to a given rate (z.B 100 Request per second ). If the threshold is exceeded requests should be answered with a default response. I am interested in requests per second not connections per second as a connection can have many requests. And yes to improve the servers is not an option here. If yes a configuration example to achieve that would be excellent. Thank you in advance.

Read the article

Java: how to do fast copy of a BufferedImage's pixels? (include unit test)

- by WizardOfOdds

I want to do a copy (of a rectangle area) of the ARGB values from a source BufferedImage into a destination BufferedImage. No compositing should be done: if I copy a pixel with an ARGB value of 0x8000BE50 (alpha value at 128), then the destination pixel must be exactly 0x8000BE50, totally overriding the destination pixel. I've got a very precise question and I made a unit test to show what I need. The unit test is fully functional and self-contained and is passing fine and is doing precisely what I want. However, I want a faster and more memory efficient method to replace copySrcIntoDstAt(...). That's the whole point of my question: I'm not after how to "fill" the image in a faster way (what I did is just an example to have a unit test). All I want is to know what would be a fast and memory efficient way to do it (ie fast and not creating needless objects). The proof-of-concept implementation I've made is obviously very memory efficient, but it is slow (doing one getRGB and one setRGB for every pixel). Schematically, I've got this: (where A indicates corresponding pixels from the destination image before the copy) AAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAA And I want to have this: AAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAABBBBAAA AAAAAAAAAAAAABBBBAAA AAAAAAAAAAAAAAAAAAAA where 'B' represents the pixels from the src image. I'm looking for an exact replacement of the method, not for an API link/quote. import org.junit.Test; import java.awt.image.BufferedImage; import static org.junit.Assert.*; public class TestCopy { private static final int COL1 = 0x8000BE50; // alpha at 128 private static final int COL2 = 0x1732FE87; // alpha at 23 @Test public void testPixelsCopy() { final BufferedImage src = new BufferedImage( 5, 5, BufferedImage.TYPE_INT_ARGB ); final BufferedImage dst = new BufferedImage( 20, 20, BufferedImage.TYPE_INT_ARGB ); convenienceFill( src, COL1 ); convenienceFill( dst, COL2 ); copySrcIntoDstAt( src, dst, 3, 4 ); for (int x = 0; x < dst.getWidth(); x++) { for (int y = 0; y < dst.getHeight(); y++) { if ( x >= 3 && x <= 7 && y >= 4 && y <= 8 ) { assertEquals( COL1, dst.getRGB(x,y) ); } else { assertEquals( COL2, dst.getRGB(x,y) ); } } } } // clipping is unnecessary private static void copySrcIntoDstAt( final BufferedImage src, final BufferedImage dst, final int dx, final int dy ) { // TODO: replace this by a much more efficient method for (int x = 0; x < src.getWidth(); x++) { for (int y = 0; y < src.getHeight(); y++) { dst.setRGB( dx + x, dy + y, src.getRGB(x,y) ); } } } // This method is just a convenience method, there's // no point in optimizing this method, this is not what // this question is about private static void convenienceFill( final BufferedImage bi, final int color ) { for (int x = 0; x < bi.getWidth(); x++) { for (int y = 0; y < bi.getHeight(); y++) { bi.setRGB( x, y, color ); } } } }

Read the article

Different cursor formats in IOFrameBufferShared

- by Thomi

Hi, I'm reading the moust cursor pixmap data from the StdFBShmem_t structure, as defined in the IOFrameBufferShared API. Everything works fine, 90% of the time. However, I have noticed that some applications on the mac set a cursor in a different format. According to the documentation for the data structures, the cursor pixmap format should always be in the same format as the frame buffer. My frame buffer is 32BPP. I expect the pixmap data to be in the format 0xAARRGGBB, which is it. However, in some cases, I'm reading data that looks like a mask. Specifically, the pixel will either be 0x00FFFFFF or `0x00000000. This looks to me to be a mask for separate pixel data stored somewhere else. As far as I can tell, the only application that uses this cursor pixel format is Qt Creator, but I need to work with all applications, so I'd like to sort this out. The code I'm using to read the cursor pixmap data is: NSAutoreleasePool *autoReleasePool = [[NSAutoreleasePool alloc] init]; NSPoint mouseLocation = [NSEvent mouseLocation]; NSArray *allScreens = [NSScreen screens]; NSEnumerator *screensEnum = [allScreens objectEnumerator]; NSScreen *screen; NSDictionary *screenDesc = nil; while ((screen = [screensEnum nextObject])) { NSRect screenFrame = [screen frame]; screenDesc = [screen deviceDescription]; if (NSMouseInRect(mouseLocation, screenFrame, NO)) break; } if (screen) { kern_return_t err; CGDirectDisplayID displayID = (CGDirectDisplayID) [[screenDesc objectForKey:@"NSScreenNumber"] pointerValue]; task_port_t taskPort = mach_task_self(); io_service_t displayServicePort = CGDisplayIOServicePort(displayID); io_connect_t displayConnection =0; err = IOFramebufferOpen(displayServicePort, taskPort, kIOFBSharedConnectType, &displayConnection); if (KERN_SUCCESS == err) { union { vm_address_t vm_ptr; StdFBShmem_t *fbshmem; } cursorInfo; vm_size_t size; err = IOConnectMapMemory(displayConnection, kIOFBCursorMemory, taskPort, &cursorInfo.vm_ptr, &size, kIOMapAnywhere | kIOMapDefaultCache | kIOMapReadOnly); if (KERN_SUCCESS == err) { // for some reason, cursor data is not always in the same format as the frame buffer. For this reason, we need // some way to detect which structure we should be reading. QByteArray pixData((const char*)cursorInfo.fbshmem->cursor.rgb24.image[currentFrame], m_mouseInfo.currentSize.width() * m_mouseInfo.currentSize.height() * 4); IOConnectUnmapMemory(displayConnection, kIOFBCursorMemory, taskPort, cursorInfo.vm_ptr); } // IOConnectMapMemory else qDebug() << "IOConnectMapMemory Failed:" << err; IOServiceClose(displayConnection); } // IOServiceOpen else qDebug() << "IOFramebufferOpen Failed:" << err; }// if screen [autoReleasePool release]; My question is: How can I detect if the cursor is a different format from the framebuffer? Where can I read the actual pixel data? the bm18Cursor structure contains a mask section, but it's not in the right place for me to be reading it using the code above. Cheers,

Read the article

Parallelism in .NET – Part 2, Simple Imperative Data Parallelism

- by Reed

In my discussion of Decomposition of the problem space, I mentioned that Data Decomposition is often the simplest abstraction to use when trying to parallelize a routine. If a problem can be decomposed based off the data, we will often want to use what MSDN refers to as Data Parallelism as our strategy for implementing our routine. The Task Parallel Library in .NET 4 makes implementing Data Parallelism, for most cases, very simple. Data Parallelism is the main technique we use to parallelize a routine which can be decomposed based off data. Data Parallelism refers to taking a single collection of data, and having a single operation be performed concurrently on elements in the collection. One side note here: Data Parallelism is also sometimes referred to as the Loop Parallelism Pattern or Loop-level Parallelism. In general, for this series, I will try to use the terminology used in the MSDN Documentation for the Task Parallel Library. This should make it easier to investigate these topics in more detail. Once we’ve determined we have a problem that, potentially, can be decomposed based on data, implementation using Data Parallelism in the TPL is quite simple. Let’s take our example from the Data Decomposition discussion – a simple contrast stretching filter. Here, we have a collection of data (pixels), and we need to run a simple operation on each element of the pixel. Once we know the minimum and maximum values, we most likely would have some simple code like the following: for (int row=0; row < pixelData.GetUpperBound(0); ++row) { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This simple routine loops through a two dimensional array of pixelData, and calls the AdjustContrast routine on each pixel. As I mentioned, when you’re decomposing a problem space, most iteration statements are potentially candidates for data decomposition. Here, we’re using two for loops – one looping through rows in the image, and a second nested loop iterating through the columns. We then perform one, independent operation on each element based on those loop positions. This is a prime candidate – we have no shared data, no dependencies on anything but the pixel which we want to change. Since we’re using a for loop, we can easily parallelize this using the Parallel.For method in the TPL: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Here, by simply changing our first for loop to a call to Parallel.For, we can parallelize this portion of our routine. Parallel.For works, as do many methods in the TPL, by creating a delegate and using it as an argument to a method. In this case, our for loop iteration block becomes a delegate creating via a lambda expression. This lets you write code that, superficially, looks similar to the familiar for loop, but functions quite differently at runtime. We could easily do this to our second for loop as well, but that may not be a good idea. There is a balance to be struck when writing parallel code. We want to have enough work items to keep all of our processors busy, but the more we partition our data, the more overhead we introduce. In this case, we have an image of data – most likely hundreds of pixels in both dimensions. By just parallelizing our first loop, each row of pixels can be run as a single task. With hundreds of rows of data, we are providing fine enough granularity to keep all of our processors busy. If we parallelize both loops, we’re potentially creating millions of independent tasks. This introduces extra overhead with no extra gain, and will actually reduce our overall performance. This leads to my first guideline when writing parallel code: Partition your problem into enough tasks to keep each processor busy throughout the operation, but not more than necessary to keep each processor busy. Also note that I parallelized the outer loop. I could have just as easily partitioned the inner loop. However, partitioning the inner loop would have led to many more discrete work items, each with a smaller amount of work (operate on one pixel instead of one row of pixels). My second guideline when writing parallel code reflects this: Partition your problem in a way to place the most work possible into each task. This typically means, in practice, that you will want to parallelize the routine at the “highest” point possible in the routine, typically the outermost loop. If you’re looking at parallelizing methods which call other methods, you’ll want to try to partition your work high up in the stack – as you get into lower level methods, the performance impact of parallelizing your routines may not overcome the overhead introduced. Parallel.For works great for situations where we know the number of elements we’re going to process in advance. If we’re iterating through an IList<T> or an array, this is a typical approach. However, there are other iteration statements common in C#. In many situations, we’ll use foreach instead of a for loop. This can be more understandable and easier to read, but also has the advantage of working with collections which only implement IEnumerable<T>, where we do not know the number of elements involved in advance. As an example, lets take the following situation. Say we have a collection of Customers, and we want to iterate through each customer, check some information about the customer, and if a certain case is met, send an email to the customer and update our instance to reflect this change. Normally, this might look something like: foreach(var customer in customers) { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } } Here, we’re doing a fair amount of work for each customer in our collection, but we don’t know how many customers exist. If we assume that theStore.GetLastContact(customer) and theStore.EmailCustomer(customer) are both side-effect free, thread safe operations, we could parallelize this using Parallel.ForEach: Parallel.ForEach(customers, customer => { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } }); Just like Parallel.For, we rework our loop into a method call accepting a delegate created via a lambda expression. This keeps our new code very similar to our original iteration statement, however, this will now execute in parallel. The same guidelines apply with Parallel.ForEach as with Parallel.For. The other iteration statements, do and while, do not have direct equivalents in the Task Parallel Library. These, however, are very easy to implement using Parallel.ForEach and the yield keyword. Most applications can benefit from implementing some form of Data Parallelism. Iterating through collections and performing “work” is a very common pattern in nearly every application. When the problem can be decomposed by data, we often can parallelize the workload by merely changing foreach statements to Parallel.ForEach method calls, and for loops to Parallel.For method calls. Any time your program operates on a collection, and does a set of work on each item in the collection where that work is not dependent on other information, you very likely have an opportunity to parallelize your routine.

Read the article

Parallelism in .NET – Part 4, Imperative Data Parallelism: Aggregation

- by Reed

In the article on simple data parallelism, I described how to perform an operation on an entire collection of elements in parallel. Often, this is not adequate, as the parallel operation is going to be performing some form of aggregation. Simple examples of this might include taking the sum of the results of processing a function on each element in the collection, or finding the minimum of the collection given some criteria. This can be done using the techniques described in simple data parallelism, however, special care needs to be taken into account to synchronize the shared data appropriately. The Task Parallel Library has tools to assist in this synchronization. The main issue with aggregation when parallelizing a routine is that you need to handle synchronization of data. Since multiple threads will need to write to a shared portion of data. Suppose, for example, that we wanted to parallelize a simple loop that looked for the minimum value within a dataset: double min = double.MaxValue; foreach(var item in collection) { double value = item.PerformComputation(); min = System.Math.Min(min, value); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This seems like a good candidate for parallelization, but there is a problem here. If we just wrap this into a call to Parallel.ForEach, we’ll introduce a critical race condition, and get the wrong answer. Let’s look at what happens here: // Buggy code! Do not use! double min = double.MaxValue; Parallel.ForEach(collection, item => { double value = item.PerformComputation(); min = System.Math.Min(min, value); }); This code has a fatal flaw: min will be checked, then set, by multiple threads simultaneously. Two threads may perform the check at the same time, and set the wrong value for min. Say we get a value of 1 in thread 1, and a value of 2 in thread 2, and these two elements are the first two to run. If both hit the min check line at the same time, both will determine that min should change, to 1 and 2 respectively. If element 1 happens to set the variable first, then element 2 sets the min variable, we’ll detect a min value of 2 instead of 1. This can lead to wrong answers. Unfortunately, fixing this, with the Parallel.ForEach call we’re using, would require adding locking. We would need to rewrite this like: // Safe, but slow double min = double.MaxValue; // Make a "lock" object object syncObject = new object(); Parallel.ForEach(collection, item => { double value = item.PerformComputation(); lock(syncObject) min = System.Math.Min(min, value); }); This will potentially add a huge amount of overhead to our calculation. Since we can potentially block while waiting on the lock for every single iteration, we will most likely slow this down to where it is actually quite a bit slower than our serial implementation. The problem is the lock statement – any time you use lock(object), you’re almost assuring reduced performance in a parallel situation. This leads to two observations I’ll make: When parallelizing a routine, try to avoid locks. That being said: Always add any and all required synchronization to avoid race conditions. These two observations tend to be opposing forces – we often need to synchronize our algorithms, but we also want to avoid the synchronization when possible. Looking at our routine, there is no way to directly avoid this lock, since each element is potentially being run on a separate thread, and this lock is necessary in order for our routine to function correctly every time. However, this isn’t the only way to design this routine to implement this algorithm. Realize that, although our collection may have thousands or even millions of elements, we have a limited number of Processing Elements (PE). Processing Element is the standard term for a hardware element which can process and execute instructions. This typically is a core in your processor, but many modern systems have multiple hardware execution threads per core. The Task Parallel Library will not execute the work for each item in the collection as a separate work item. Instead, when Parallel.ForEach executes, it will partition the collection into larger “chunks” which get processed on different threads via the ThreadPool. This helps reduce the threading overhead, and help the overall speed. In general, the Parallel class will only use one thread per PE in the system. Given the fact that there are typically fewer threads than work items, we can rethink our algorithm design. We can parallelize our algorithm more effectively by approaching it differently. Because the basic aggregation we are doing here (Min) is communitive, we do not need to perform this in a given order. We knew this to be true already – otherwise, we wouldn’t have been able to parallelize this routine in the first place. With this in mind, we can treat each thread’s work independently, allowing each thread to serially process many elements with no locking, then, after all the threads are complete, “merge” together the results. This can be accomplished via a different set of overloads in the Parallel class: Parallel.ForEach<TSource,TLocal>. The idea behind these overloads is to allow each thread to begin by initializing some local state (TLocal). The thread will then process an entire set of items in the source collection, providing that state to the delegate which processes an individual item. Finally, at the end, a separate delegate is run which allows you to handle merging that local state into your final results. To rewriting our routine using Parallel.ForEach<TSource,TLocal>, we need to provide three delegates instead of one. The most basic version of this function is declared as: public static ParallelLoopResult ForEach<TSource, TLocal>( IEnumerable<TSource> source, Func<TLocal> localInit, Func<TSource, ParallelLoopState, TLocal, TLocal> body, Action<TLocal> localFinally ) The first delegate (the localInit argument) is defined as Func<TLocal>. This delegate initializes our local state. It should return some object we can use to track the results of a single thread’s operations. The second delegate (the body argument) is where our main processing occurs, although now, instead of being an Action<T>, we actually provide a Func<TSource, ParallelLoopState, TLocal, TLocal> delegate. This delegate will receive three arguments: our original element from the collection (TSource), a ParallelLoopState which we can use for early termination, and the instance of our local state we created (TLocal). It should do whatever processing you wish to occur per element, then return the value of the local state after processing is completed. The third delegate (the localFinally argument) is defined as Action<TLocal>. This delegate is passed our local state after it’s been processed by all of the elements this thread will handle. This is where you can merge your final results together. This may require synchronization, but now, instead of synchronizing once per element (potentially millions of times), you’ll only have to synchronize once per thread, which is an ideal situation. Now that I’ve explained how this works, lets look at the code: // Safe, and fast! double min = double.MaxValue; // Make a "lock" object object syncObject = new object(); Parallel.ForEach( collection, // First, we provide a local state initialization delegate. () => double.MaxValue, // Next, we supply the body, which takes the original item, loop state, // and local state, and returns a new local state (item, loopState, localState) => { double value = item.PerformComputation(); return System.Math.Min(localState, value); }, // Finally, we provide an Action<TLocal>, to "merge" results together localState => { // This requires locking, but it's only once per used thread lock(syncObj) min = System.Math.Min(min, localState); } ); Although this is a bit more complicated than the previous version, it is now both thread-safe, and has minimal locking. This same approach can be used by Parallel.For, although now, it’s Parallel.For<TLocal>. When working with Parallel.For<TLocal>, you use the same triplet of delegates, with the same purpose and results. Also, many times, you can completely avoid locking by using a method of the Interlocked class to perform the final aggregation in an atomic operation. The MSDN example demonstrating this same technique using Parallel.For uses the Interlocked class instead of a lock, since they are doing a sum operation on a long variable, which is possible via Interlocked.Add. By taking advantage of local state, we can use the Parallel class methods to parallelize algorithms such as aggregation, which, at first, may seem like poor candidates for parallelization. Doing so requires careful consideration, and often requires a slight redesign of the algorithm, but the performance gains can be significant if handled in a way to avoid excessive synchronization.

Search Results

Search found 10098 results on 404 pages for 'per pixel'.

Page 58/404 | < Previous Page | 54 55 56 57 58 59 60 61 62 63 64 65 | Next Page >

- by Thucydides411

- by vlannoob

- by Pavel

- by cadrian

- by cpx

- by CandyCo

- by javano

- by last-child

- by boxerbucks

- by netvope

- by Marty Heath

- by Jonas

- by Dan Bird

- by Alan Hoffmeister

- by matthewsteiner

- by Andy

- by FunkyChicken

- by John

- by Abhijith

- by CandyCo

- by user2050516

- by WizardOfOdds

- by Thomi

- by Reed

- by Reed

< Previous Page | 54 55 56 57 58 59 60 61 62 63 64 65 | Next Page >