Search Results

Search found 9 results on 1 pages for 'undecidable'.

Page 1/1 | 1 

  • Why doesn't Haskell have type-level lambda abstractions?

    - by Petr Pudlák
    Are there some theoretical reasons for that (like that the type checking or type inference would become undecidable), or practical reasons (too difficult to implement properly)? Currently, we can wrap things into newtype like newtype Pair a = Pair (a, a) and then have Pair :: * -> * but we cannot do something like ?(a:*). (a,a). (There are some languages that have them, for example, Scala does.)

    Read the article

  • "continue" and "break" for static analysis

    - by B. VB.
    I know there have been a number of discussions of whether break and continue should be considered harmful generally (with the bottom line being - more or less - that it depends; in some cases they enhance clarity and readability, but in other cases they do not). Suppose a new project is starting development, with plans for nightly builds including a run through a static analyzer. Should it be part of the coding guidelines for the project to avoid (or strongly discourage) the use of continue and break, even if it can sacrifice a little readability and require excessive indentation? I'm most interested in how this applies to C code. Essentially, can the use of these control operators significantly complicate the static analysis of the code possibly resulting in additional false negatives, that would otherwise register a potential fault if break or continue were not used? (Of course a complete static analysis proving the correctness of an aribtrary program is an undecidable proposition, so please keep responses about any hands-on experience with this you have, and not on theoretical impossibilities) Thanks in advance!

    Read the article

  • Programming Language, Turing Completeness and Turing Machine

    - by Amumu
    A programming language is said to be Turing Completeness if it can successfully simulate a universal TM. Let's take functional programming language for example. In functional programming, function has highest priority over anything. You can pass functions around like any primitives or objects. This is called first class function. In functional programming, your function does not produce side effect i.e. output strings onto screen, change the state of variables outside of its scope. Each function has a copy of its own objects if the objects are passed from the outside, and the copied objects are returned once the function finishes its job. Each function written purely in functional style is completely independent to anything outside of it. Thus, the complexity of the overall system is reduced. This is referred as referential transparency. In functional programming, each function can have its local variables kept its values even after the function exits. This is done by the garbage collector. The value can be reused the next time the function is called again. This is called memoization. A function usually should solve only one thing. It should model only one algorithm to answer a problem. Do you think that a function in a functional language with above properties simulate a Turing Machines? Functions (= algorithms = Turing Machines) are able to be passed around as input and returned as output. TM also accepts and simulate other TMs Memoization models the set of states of a Turing Machine. The memorized variables can be used to determine states of a TM (i.e. which lines to execute, what behavior should it take in a give state ...). Also, you can use memoization to simulate your internal tape storage. In language like C/C++, when a function exits, you lose all of its internal data (unless you store it elsewhere outside of its scope). The set of symbols are the set of all strings in a programming language, which is the higher level and human-readable version of machine code (opcode) Start state is the beginning of the function. However, with memoization, start state can be determined by memoization or if you want, switch/if-else statement in imperative programming language. But then, you can't Final accepting state when the function returns a value, or rejects if an exception happens. Thus, the function (= algorithm = TM) is decidable. Otherwise, it's undecidable. I'm not sure about this. What do you think? Is my thinking true on all of this? The reason I bring function in functional programming because I think it's closer to the idea of TM. What experience with other programming languages do you have which make you feel the idea of TM and the ideas of Computer Science in general? Can you specify how you think?

    Read the article

  • Ruby on rails generates tests for you. Do those give a false sense of a safety net?

    - by Hamish Grubijan
    Disclaimer: I have not used RoR, and I have not generated tests. But, I will still dare to post this question. Quality Assurance is theoretically impossible to get 100% right in general (Undecidable problem ;), and it is hard in practice. So many developers do not understand that writing good automated tests is an art, and it is hard. When I hear that RoR generates the tests for you, I get very skeptical. It cannot be that easy. Testing is a general concept; it applies across languages. So does the concept of code contracts, it is similar for languages that support it. Code contracts do not generate themselves. The programmer must add the requirements and the promises manually, after doing some thinking about the algorithm / function. If a human gets it wrong, then the tools will propagate the error. Similarly with testing - it takes human judgement about what should happen. Tests do not write themselves, and we are far from the day when a business analyst can just have a conversation with a computer and tell it informally what the requirements are and have the computer do all the work. There is no magic ... how can RoR generate good tests for you? Please shed some light on this. Opinions are ok, for this is a community wiki. Thanks!

    Read the article

  • Why doesn't C# do "simple" type inference on generics?

    - by Ken Birman
    Just curious: sure, we all know that the general case of type inference for generics is undecidable. And so C# won't do any kind of subtyping at all: if Foo<T> is a generic, Foo<int> isn't a subtype of Foo<T>, or Foo<Object> or of anything else you might cook up. And sure, we all hack around this with ugly interface or abstract class definitions. But... if you can't beat the general problem, why not just limit the solution to cases that are easy. For example, in my list above, it is OBVIOUS that Foo<int> is a subtype of Foo<T> and it would be trivial to check. Same for checking against Foo<Object>. So is there some other deep horror that would creep forth from the abyss if they were to just say, aw shucks, we'll do what we can? Or is this just some sort of religious purity on the part of the language guys at Microsoft?

    Read the article

  • College Courses through distance learning

    - by Matt
    I realize this isn't really a programming question, but didn't really know where to post this in the stackexchange and because I am a computer science major i thought id ask here. This is pretty unique to the programmer community since my degree is about 95% programming. I have 1 semester left, but i work full time. I would like to finish up in December, but to make things easier i like to take online classes whenever I can. So, my question is does anyone know of any colleges that offer distance learning courses for computer science? I have been searching around and found a few potential classes, but not sure yet. I would like to gather some classes and see what i can get approval for. Class I need: Only need one C SC 437 Geometric Algorithms C SC 445 Algorithms C SC 473 Automata Only need one C SC 452 Operating Systems C SC 453 Compilers/Systems Software While i only need of each of the above courses i still need to take two more electives. These also have to be upper 400 level classes. So i can take multiple in each category. Some other classes I can take are: CSC 447 - Green Computing CSC 425 - Computer Networking CSC 460 - Database Design CSC 466 - Computer Security I hoping to take one or two of these courses over the summer. If not, then online over the regular semester would be ok too. Any help in helping find these classes would be awesome. Maybe you went to a college that offered distance learning. Some of these classes may be considered to be graduate courses too. Descriptions are listed below if you need. Thanks! Descriptions Computer Security This is an introductory course covering the fundamentals of computer security. In particular, the course will cover basic concepts of computer security such as threat models and security policies, and will show how these concepts apply to specific areas such as communication security, software security, operating systems security, network security, web security, and hardware-based security. Computer Networking Theory and practice of computer networks, emphasizing the principles underlying the design of network software and the role of the communications system in distributed computing. Topics include routing, flow and congestion control, end-to-end protocols, and multicast. Database Design Functions of a database system. Data modeling and logical database design. Query languages and query optimization. Efficient data storage and access. Database access through standalone and web applications. Green Computing This course covers fundamental principles of energy management faced by designers of hardware, operating systems, and data centers. We will explore basic energy management option in individual components such as CPUs, network interfaces, hard drives, memory. We will further present the energy management policies at the operating system level that consider performance vs. energy saving tradeoffs. Finally we will consider large scale data centers where energy management is done at multiple layers from individual components in the system to shutting down entries subset of machines. We will also discuss energy generation and delivery and well as cooling issues in large data centers. Compilers/Systems Software Basic concepts of compilation and related systems software. Topics include lexical analysis, parsing, semantic analysis, code generation; assemblers, loaders, linkers; debuggers. Operating Systems Concepts of modern operating systems; concurrent processes; process synchronization and communication; resource allocation; kernels; deadlock; memory management; file systems. Algorithms Introduction to the design and analysis of algorithms: basic analysis techniques (asymptotics, sums, recurrences); basic design techniques (divide and conquer, dynamic programming, greedy, amortization); acquiring an algorithm repertoire (sorting, median finding, strong components, spanning trees, shortest paths, maximum flow, string matching); and handling intractability (approximation algorithms, branch and bound). Automata Introduction to models of computation (finite automata, pushdown automata, Turing machines), representations of languages (regular expressions, context-free grammars), and the basic hierarchy of languages (regular, context-free, decidable, and undecidable languages). Geometric Algorithms The study of algorithms for geometric objects, using a computational geometry approach, with an emphasis on applications for graphics, VLSI, GIS, robotics, and sensor networks. Topics may include the representation and overlaying of maps, finding nearest neighbors, solving linear programming problems, and searching geometric databases.

    Read the article

  • Toorcon 15 (2013)

    - by danx
    The Toorcon gang (senior staff): h1kari (founder), nfiltr8, and Geo Introduction to Toorcon 15 (2013) A Tale of One Software Bypass of MS Windows 8 Secure Boot Breaching SSL, One Byte at a Time Running at 99%: Surviving an Application DoS Security Response in the Age of Mass Customized Attacks x86 Rewriting: Defeating RoP and other Shinanighans Clowntown Express: interesting bugs and running a bug bounty program Active Fingerprinting of Encrypted VPNs Making Attacks Go Backwards Mask Your Checksums—The Gorry Details Adventures with weird machines thirty years after "Reflections on Trusting Trust" Introduction to Toorcon 15 (2013) Toorcon 15 is the 15th annual security conference held in San Diego. I've attended about a third of them and blogged about previous conferences I attended here starting in 2003. As always, I've only summarized the talks I attended and interested me enough to write about them. Be aware that I may have misrepresented the speaker's remarks and that they are not my remarks or opinion, or those of my employer, so don't quote me or them. Those seeking further details may contact the speakers directly or use The Google. For some talks, I have a URL for further information. A Tale of One Software Bypass of MS Windows 8 Secure Boot Andrew Furtak and Oleksandr Bazhaniuk Yuri Bulygin, Oleksandr ("Alex") Bazhaniuk, and (not present) Andrew Furtak Yuri and Alex talked about UEFI and Bootkits and bypassing MS Windows 8 Secure Boot, with vendor recommendations. They previously gave this talk at the BlackHat 2013 conference. MS Windows 8 Secure Boot Overview UEFI (Unified Extensible Firmware Interface) is interface between hardware and OS. UEFI is processor and architecture independent. Malware can replace bootloader (bootx64.efi, bootmgfw.efi). Once replaced can modify kernel. Trivial to replace bootloader. Today many legacy bootkits—UEFI replaces them most of them. MS Windows 8 Secure Boot verifies everything you load, either through signatures or hashes. UEFI firmware relies on secure update (with signed update). You would think Secure Boot would rely on ROM (such as used for phones0, but you can't do that for PCs—PCs use writable memory with signatures DXE core verifies the UEFI boat loader(s) OS Loader (winload.efi, winresume.efi) verifies the OS kernel A chain of trust is established with a root key (Platform Key, PK), which is a cert belonging to the platform vendor. Key Exchange Keys (KEKs) verify an "authorized" database (db), and "forbidden" database (dbx). X.509 certs with SHA-1/SHA-256 hashes. Keys are stored in non-volatile (NV) flash-based NVRAM. Boot Services (BS) allow adding/deleting keys (can't be accessed once OS starts—which uses Run-Time (RT)). Root cert uses RSA-2048 public keys and PKCS#7 format signatures. SecureBoot — enable disable image signature checks SetupMode — update keys, self-signed keys, and secure boot variables CustomMode — allows updating keys Secure Boot policy settings are: always execute, never execute, allow execute on security violation, defer execute on security violation, deny execute on security violation, query user on security violation Attacking MS Windows 8 Secure Boot Secure Boot does NOT protect from physical access. Can disable from console. Each BIOS vendor implements Secure Boot differently. There are several platform and BIOS vendors. It becomes a "zoo" of implementations—which can be taken advantage of. Secure Boot is secure only when all vendors implement it correctly. Allow only UEFI firmware signed updates protect UEFI firmware from direct modification in flash memory protect FW update components program SPI controller securely protect secure boot policy settings in nvram protect runtime api disable compatibility support module which allows unsigned legacy Can corrupt the Platform Key (PK) EFI root certificate variable in SPI flash. If PK is not found, FW enters setup mode wich secure boot turned off. Can also exploit TPM in a similar manner. One is not supposed to be able to directly modify the PK in SPI flash from the OS though. But they found a bug that they can exploit from User Mode (undisclosed) and demoed the exploit. It loaded and ran their own bootkit. The exploit requires a reboot. Multiple vendors are vulnerable. They will disclose this exploit to vendors in the future. Recommendations: allow only signed updates protect UEFI fw in ROM protect EFI variable store in ROM Breaching SSL, One Byte at a Time Yoel Gluck and Angelo Prado Angelo Prado and Yoel Gluck, Salesforce.com CRIME is software that performs a "compression oracle attack." This is possible because the SSL protocol doesn't hide length, and because SSL compresses the header. CRIME requests with every possible character and measures the ciphertext length. Look for the plaintext which compresses the most and looks for the cookie one byte-at-a-time. SSL Compression uses LZ77 to reduce redundancy. Huffman coding replaces common byte sequences with shorter codes. US CERT thinks the SSL compression problem is fixed, but it isn't. They convinced CERT that it wasn't fixed and they issued a CVE. BREACH, breachattrack.com BREACH exploits the SSL response body (Accept-Encoding response, Content-Encoding). It takes advantage of the fact that the response is not compressed. BREACH uses gzip and needs fairly "stable" pages that are static for ~30 seconds. It needs attacker-supplied content (say from a web form or added to a URL parameter). BREACH listens to a session's requests and responses, then inserts extra requests and responses. Eventually, BREACH guesses a session's secret key. Can use compression to guess contents one byte at-a-time. For example, "Supersecret SupersecreX" (a wrong guess) compresses 10 bytes, and "Supersecret Supersecret" (a correct guess) compresses 11 bytes, so it can find each character by guessing every character. To start the guess, BREACH needs at least three known initial characters in the response sequence. Compression length then "leaks" information. Some roadblocks include no winners (all guesses wrong) or too many winners (multiple possibilities that compress the same). The solutions include: lookahead (guess 2 or 3 characters at-a-time instead of 1 character). Expensive rollback to last known conflict check compression ratio can brute-force first 3 "bootstrap" characters, if needed (expensive) block ciphers hide exact plain text length. Solution is to align response in advance to block size Mitigations length: use variable padding secrets: dynamic CSRF tokens per request secret: change over time separate secret to input-less servlets Future work eiter understand DEFLATE/GZIP HTTPS extensions Running at 99%: Surviving an Application DoS Ryan Huber Ryan Huber, Risk I/O Ryan first discussed various ways to do a denial of service (DoS) attack against web services. One usual method is to find a slow web page and do several wgets. Or download large files. Apache is not well suited at handling a large number of connections, but one can put something in front of it Can use Apache alternatives, such as nginx How to identify malicious hosts short, sudden web requests user-agent is obvious (curl, python) same url requested repeatedly no web page referer (not normal) hidden links. hide a link and see if a bot gets it restricted access if not your geo IP (unless the website is global) missing common headers in request regular timing first seen IP at beginning of attack count requests per hosts (usually a very large number) Use of captcha can mitigate attacks, but you'll lose a lot of genuine users. Bouncer, goo.gl/c2vyEc and www.github.com/rawdigits/Bouncer Bouncer is software written by Ryan in netflow. Bouncer has a small, unobtrusive footprint and detects DoS attempts. It closes blacklisted sockets immediately (not nice about it, no proper close connection). Aggregator collects requests and controls your web proxies. Need NTP on the front end web servers for clean data for use by bouncer. Bouncer is also useful for a popularity storm ("Slashdotting") and scraper storms. Future features: gzip collection data, documentation, consumer library, multitask, logging destroyed connections. Takeaways: DoS mitigation is easier with a complete picture Bouncer designed to make it easier to detect and defend DoS—not a complete cure Security Response in the Age of Mass Customized Attacks Peleus Uhley and Karthik Raman Peleus Uhley and Karthik Raman, Adobe ASSET, blogs.adobe.com/asset/ Peleus and Karthik talked about response to mass-customized exploits. Attackers behave much like a business. "Mass customization" refers to concept discussed in the book Future Perfect by Stan Davis of Harvard Business School. Mass customization is differentiating a product for an individual customer, but at a mass production price. For example, the same individual with a debit card receives basically the same customized ATM experience around the world. Or designing your own PC from commodity parts. Exploit kits are another example of mass customization. The kits support multiple browsers and plugins, allows new modules. Exploit kits are cheap and customizable. Organized gangs use exploit kits. A group at Berkeley looked at 77,000 malicious websites (Grier et al., "Manufacturing Compromise: The Emergence of Exploit-as-a-Service", 2012). They found 10,000 distinct binaries among them, but derived from only a dozen or so exploit kits. Characteristics of Mass Malware: potent, resilient, relatively low cost Technical characteristics: multiple OS, multipe payloads, multiple scenarios, multiple languages, obfuscation Response time for 0-day exploits has gone down from ~40 days 5 years ago to about ~10 days now. So the drive with malware is towards mass customized exploits, to avoid detection There's plenty of evicence that exploit development has Project Manager bureaucracy. They infer from the malware edicts to: support all versions of reader support all versions of windows support all versions of flash support all browsers write large complex, difficult to main code (8750 lines of JavaScript for example Exploits have "loose coupling" of multipe versions of software (adobe), OS, and browser. This allows specific attacks against specific versions of multiple pieces of software. Also allows exploits of more obscure software/OS/browsers and obscure versions. Gave examples of exploits that exploited 2, 3, 6, or 14 separate bugs. However, these complete exploits are more likely to be buggy or fragile in themselves and easier to defeat. Future research includes normalizing malware and Javascript. Conclusion: The coming trend is that mass-malware with mass zero-day attacks will result in mass customization of attacks. x86 Rewriting: Defeating RoP and other Shinanighans Richard Wartell Richard Wartell The attack vector we are addressing here is: First some malware causes a buffer overflow. The malware has no program access, but input access and buffer overflow code onto stack Later the stack became non-executable. The workaround malware used was to write a bogus return address to the stack jumping to malware Later came ASLR (Address Space Layout Randomization) to randomize memory layout and make addresses non-deterministic. The workaround malware used was to jump t existing code segments in the program that can be used in bad ways "RoP" is Return-oriented Programming attacks. RoP attacks use your own code and write return address on stack to (existing) expoitable code found in program ("gadgets"). Pinkie Pie was paid $60K last year for a RoP attack. One solution is using anti-RoP compilers that compile source code with NO return instructions. ASLR does not randomize address space, just "gadgets". IPR/ILR ("Instruction Location Randomization") randomizes each instruction with a virtual machine. Richard's goal was to randomize a binary with no source code access. He created "STIR" (Self-Transofrming Instruction Relocation). STIR disassembles binary and operates on "basic blocks" of code. The STIR disassembler is conservative in what to disassemble. Each basic block is moved to a random location in memory. Next, STIR writes new code sections with copies of "basic blocks" of code in randomized locations. The old code is copied and rewritten with jumps to new code. the original code sections in the file is marked non-executible. STIR has better entropy than ASLR in location of code. Makes brute force attacks much harder. STIR runs on MS Windows (PEM) and Linux (ELF). It eliminated 99.96% or more "gadgets" (i.e., moved the address). Overhead usually 5-10% on MS Windows, about 1.5-4% on Linux (but some code actually runs faster!). The unique thing about STIR is it requires no source access and the modified binary fully works! Current work is to rewrite code to enforce security policies. For example, don't create a *.{exe,msi,bat} file. Or don't connect to the network after reading from the disk. Clowntown Express: interesting bugs and running a bug bounty program Collin Greene Collin Greene, Facebook Collin talked about Facebook's bug bounty program. Background at FB: FB has good security frameworks, such as security teams, external audits, and cc'ing on diffs. But there's lots of "deep, dark, forgotten" parts of legacy FB code. Collin gave several examples of bountied bugs. Some bounty submissions were on software purchased from a third-party (but bounty claimers don't know and don't care). We use security questions, as does everyone else, but they are basically insecure (often easily discoverable). Collin didn't expect many bugs from the bounty program, but they ended getting 20+ good bugs in first 24 hours and good submissions continue to come in. Bug bounties bring people in with different perspectives, and are paid only for success. Bug bounty is a better use of a fixed amount of time and money versus just code review or static code analysis. The Bounty program started July 2011 and paid out $1.5 million to date. 14% of the submissions have been high priority problems that needed to be fixed immediately. The best bugs come from a small % of submitters (as with everything else)—the top paid submitters are paid 6 figures a year. Spammers like to backstab competitors. The youngest sumitter was 13. Some submitters have been hired. Bug bounties also allows to see bugs that were missed by tools or reviews, allowing improvement in the process. Bug bounties might not work for traditional software companies where the product has release cycle or is not on Internet. Active Fingerprinting of Encrypted VPNs Anna Shubina Anna Shubina, Dartmouth Institute for Security, Technology, and Society (I missed the start of her talk because another track went overtime. But I have the DVD of the talk, so I'll expand later) IPsec leaves fingerprints. Using netcat, one can easily visually distinguish various crypto chaining modes just from packet timing on a chart (example, DES-CBC versus AES-CBC) One can tell a lot about VPNs just from ping roundtrips (such as what router is used) Delayed packets are not informative about a network, especially if far away from the network More needed to explore about how TCP works in real life with respect to timing Making Attacks Go Backwards Fuzzynop FuzzyNop, Mandiant This talk is not about threat attribution (finding who), product solutions, politics, or sales pitches. But who are making these malware threats? It's not a single person or group—they have diverse skill levels. There's a lot of fat-fingered fumblers out there. Always look for low-hanging fruit first: "hiding" malware in the temp, recycle, or root directories creation of unnamed scheduled tasks obvious names of files and syscalls ("ClearEventLog") uncleared event logs. Clearing event log in itself, and time of clearing, is a red flag and good first clue to look for on a suspect system Reverse engineering is hard. Disassembler use takes practice and skill. A popular tool is IDA Pro, but it takes multiple interactive iterations to get a clean disassembly. Key loggers are used a lot in targeted attacks. They are typically custom code or built in a backdoor. A big tip-off is that non-printable characters need to be printed out (such as "[Ctrl]" "[RightShift]") or time stamp printf strings. Look for these in files. Presence is not proof they are used. Absence is not proof they are not used. Java exploits. Can parse jar file with idxparser.py and decomile Java file. Java typially used to target tech companies. Backdoors are the main persistence mechanism (provided externally) for malware. Also malware typically needs command and control. Application of Artificial Intelligence in Ad-Hoc Static Code Analysis John Ashaman John Ashaman, Security Innovation Initially John tried to analyze open source files with open source static analysis tools, but these showed thousands of false positives. Also tried using grep, but tis fails to find anything even mildly complex. So next John decided to write his own tool. His approach was to first generate a call graph then analyze the graph. However, the problem is that making a call graph is really hard. For example, one problem is "evil" coding techniques, such as passing function pointer. First the tool generated an Abstract Syntax Tree (AST) with the nodes created from method declarations and edges created from method use. Then the tool generated a control flow graph with the goal to find a path through the AST (a maze) from source to sink. The algorithm is to look at adjacent nodes to see if any are "scary" (a vulnerability), using heuristics for search order. The tool, called "Scat" (Static Code Analysis Tool), currently looks for C# vulnerabilities and some simple PHP. Later, he plans to add more PHP, then JSP and Java. For more information see his posts in Security Innovation blog and NRefactory on GitHub. Mask Your Checksums—The Gorry Details Eric (XlogicX) Davisson Eric (XlogicX) Davisson Sometimes in emailing or posting TCP/IP packets to analyze problems, you may want to mask the IP address. But to do this correctly, you need to mask the checksum too, or you'll leak information about the IP. Problem reports found in stackoverflow.com, sans.org, and pastebin.org are usually not masked, but a few companies do care. If only the IP is masked, the IP may be guessed from checksum (that is, it leaks data). Other parts of packet may leak more data about the IP. TCP and IP checksums both refer to the same data, so can get more bits of information out of using both checksums than just using one checksum. Also, one can usually determine the OS from the TTL field and ports in a packet header. If we get hundreds of possible results (16x each masked nibble that is unknown), one can do other things to narrow the results, such as look at packet contents for domain or geo information. With hundreds of results, can import as CSV format into a spreadsheet. Can corelate with geo data and see where each possibility is located. Eric then demoed a real email report with a masked IP packet attached. Was able to find the exact IP address, given the geo and university of the sender. Point is if you're going to mask a packet, do it right. Eric wouldn't usually bother, but do it correctly if at all, to not create a false impression of security. Adventures with weird machines thirty years after "Reflections on Trusting Trust" Sergey Bratus Sergey Bratus, Dartmouth College (and Julian Bangert and Rebecca Shapiro, not present) "Reflections on Trusting Trust" refers to Ken Thompson's classic 1984 paper. "You can't trust code that you did not totally create yourself." There's invisible links in the chain-of-trust, such as "well-installed microcode bugs" or in the compiler, and other planted bugs. Thompson showed how a compiler can introduce and propagate bugs in unmodified source. But suppose if there's no bugs and you trust the author, can you trust the code? Hell No! There's too many factors—it's Babylonian in nature. Why not? Well, Input is not well-defined/recognized (code's assumptions about "checked" input will be violated (bug/vunerabiliy). For example, HTML is recursive, but Regex checking is not recursive. Input well-formed but so complex there's no telling what it does For example, ELF file parsing is complex and has multiple ways of parsing. Input is seen differently by different pieces of program or toolchain Any Input is a program input executes on input handlers (drives state changes & transitions) only a well-defined execution model can be trusted (regex/DFA, PDA, CFG) Input handler either is a "recognizer" for the inputs as a well-defined language (see langsec.org) or it's a "virtual machine" for inputs to drive into pwn-age ELF ABI (UNIX/Linux executible file format) case study. Problems can arise from these steps (without planting bugs): compiler linker loader ld.so/rtld relocator DWARF (debugger info) exceptions The problem is you can't really automatically analyze code (it's the "halting problem" and undecidable). Only solution is to freeze code and sign it. But you can't freeze everything! Can't freeze ASLR or loading—must have tables and metadata. Any sufficiently complex input data is the same as VM byte code Example, ELF relocation entries + dynamic symbols == a Turing Complete Machine (TM). @bxsays created a Turing machine in Linux from relocation data (not code) in an ELF file. For more information, see Rebecca "bx" Shapiro's presentation from last year's Toorcon, "Programming Weird Machines with ELF Metadata" @bxsays did same thing with Mach-O bytecode Or a DWARF exception handling data .eh_frame + glibc == Turning Machine X86 MMU (IDT, GDT, TSS): used address translation to create a Turning Machine. Page handler reads and writes (on page fault) memory. Uses a page table, which can be used as Turning Machine byte code. Example on Github using this TM that will fly a glider across the screen Next Sergey talked about "Parser Differentials". That having one input format, but two parsers, will create confusion and opportunity for exploitation. For example, CSRs are parsed during creation by cert requestor and again by another parser at the CA. Another example is ELF—several parsers in OS tool chain, which are all different. Can have two different Program Headers (PHDRs) because ld.so parses multiple PHDRs. The second PHDR can completely transform the executable. This is described in paper in the first issue of International Journal of PoC. Conclusions trusting computers not only about bugs! Bugs are part of a problem, but no by far all of it complex data formats means bugs no "chain of trust" in Babylon! (that is, with parser differentials) we need to squeeze complexity out of data until data stops being "code equivalent" Further information See and langsec.org. USENIX WOOT 2013 (Workshop on Offensive Technologies) for "weird machines" papers and videos.

    Read the article

1