22
Aug

Static Code Analysis: Scan All Your Code For Bugs


Let’s start with static code analysis and what that means is we want to scan all of our code for bugs so what our software vulnerabilities and why do I care i’m sure you know a little bit about these but i want to explain a little more in detail some of the types of vulnerabilities that are available that are out there that are threatening software today and understand you know why that’s so important we deal with that and of course we’ll talk about the actual static code analysis what is it how does it fit in the software and what are some techniques that we use to do that and of course after each one of these segments I talked about just a little bit about ROI talk about why it’s important to do it up front and how does that save you money later so there’s a number of list of vulnerabilities out there there’s the sands top-25 there’s the OS top 10 in that there’s different vulnerabilities for different languages let’s take a look at a few so memory corruption is something that’s plagued native code for a long time about c c++ fully compiled code they dealt with this thing called memory corruption for a couple of decades buffer overflows upper / reach look at an example of that when I don’t think was as well understood in the past and you could party at this code manually you could look through every loop you can look at every API you can look through you know each situation but can you really do that across millions of lines of code does that really scale and so in talking about different types of languages different architectures different platforms there’s also script injection cross-site scripting a cross-site request forgery really what that is is where we failed to sanitize code properly so imagine you go to a web page and you’re able to upload something some name or whatever you can also upload script and that really shouldn’t be that should get sanitized out the next visit to the page comes the script renders in the context of their browser and lo and behold there session cookie is sent to a bad actor and all of a sudden their bank account is lost so that sort of a you know boner ability is more relevant to the web code and what’s interesting is in today’s sort of agile DevOps world we see kind of blending of these so something imagine something like office that used to be heavyweight c++ desktop software now with office 365 have a cloud offering a web portion of this so we’re seeing some of these vulnerabilities you blend across types of software and again that’s one of these things were being able to automate searching for these kind of vulnerabilities with a tool allows you to scale in a way that you just can’t do manually and other types of vulnerabilities you might be on the lookout for command injection is a big one that’s been around for years so whether it’s a sequel injection or an operating system injection let me imagine one off the top of my head suppose you have some software on an operating system and it’s going to list the files in the directory for you so you supply the directory name well what if you supply some special character like a semicolon a back kick & ampersand something that was unexpected to the programmer and it in fact the shell might interpret that character in certain ways that you essentially escape the prior command to just list the files in the directory and all of a sudden you can cap the password file overwrite the kids that pastor about something like that that’s a command injection that’s just one of many examples and you know i’m not going to take the time to go through every one of these vulnerability types but you can imagine problems with authentication problems where we expose data in a way that we should maybe we didn’t encrypted in transit or arrest misconfigurations to wear our systems not as secure as we thought it might be so there’s all these different vulnerability types and let’s just talk through a few to get a sense for how critical they are so here’s a typical memory corruption so we’ve got a static buffer of a of a kilobyte and 110 24,000 24 bytes and then we copy data in there somehow sprint mm copy mr. copy whatever it may be a for loop and the question is is this vulnerability and it may not be clear right off the bat we don’t know for example just from looking at the snippet of code how big are these variables name and domain if the size of these two strings which are then put together with a format string and then copied in the bus is larger than 1024 we have a stack buffer overflow and give you an idea of how severe that is take a look at this flow so we’ve got a function may be your main program it cause this printf and because there’s no bounced check on the local variables we see that it overwrite something important like a function pointer or the return address in this case so that when the function tries to return instead of going back to where it should here we see that in fact could go back to attack her show code that’s just to give you an example or kind of wet your appetite for how severe these vulnerabilities can be and how easy in some cases they are for attackers to exploit now of course there are operating system medications and other things nowadays that can help lessen the impact of the severity is gonna really but you know the fact remains that your code has this problem and it needs to be repaired and it’s easy to repair too by the way just change it to an SN princess for some other safe version of that copy and the problems fixed the problem is how do you know where all those vulnerabilities are this is another look at like something like this and top 25 so there’s all these different types of vulnerabilities across all of your code that you’ve got to go and look for and that’s going to be a very difficult manual process if you don’t have a way of automating let me give you one more example before we dive into what static analysis really is and how it can help this vulnerability is something that’s very interesting is very near and dear to the heart of synopsis it’s called the heartbleed vulnerability it was vulnerability that existed in openssl there was a feature for a couple years that existed with a vulnerability that was unknown and it was a feature called a heartbeat and what was is a client could talk to a server who had the vulnerability and say hey can you send me the four-letter bird and the server would say yea years birds so that’s the normal usage of the heartbeat the heartbleed usages where a malicious actor which they can you send me a 500 word bird please and the server would say yep here’s the 500 letter word bird but since you only sent it for bites it gives you 496 bites of whatever just happens to be in memory could be private keys could be passwords could be all kinds of bad things and what’s really interesting and unique about the heartbleed vulnerability is it was a buffer over-read not a buffer overflow and those were not thought to be as critical and years gone by a decade or so computed since told me you were looking for a buffer over-read they would event yeah you know maybe not so impressed with your bug hunting skills lo and behold this bug turns out to be very critical and that’s an interesting thing we’ll talk later about that when we talk about having the discussion of exploitability with developers sometimes maybe they don’t want to fix it prioritize a bugfix right away because they don’t understand or don’t believe that it’s critical and they want you to sort of prove that it’s exploitable and sometimes that can be a real rat hole that discussion we don’t really want to get into that if we know about a bug we should fix the bugs timely manner as we possibly can so let’s talk a little bit about static analysis now that we understand the background about vulnerabilities and why they’re so important and weather out there and what types of vulnerabilities you might see how do we how do we get this idea of scanning code automatically into the SEL well it’s very simple and it’s one of the most powerful things that we can do because it we can integrate this right into the implementation phase that’s that means that it doesn’t have to be something that I highly skilled you know highly-paid third-party pentester gets to do with the very end of this project know it can be integrated right up front with the developer they can in fact have it integrated with their development tools every time that compile every time they’re checking code this kind of thing can be run right there and that’s the best place to run static analysis and so before I dig into the techniques that the tool uses under the hood let me let me list some of those pros so the biggest Pro is is exactly what I was just talking about the fact that it’s integrated with the developer it’s it’s right in that process where you can integrate with your bug tracking tools you can have severity ratings you can have explanations that the developers can understand to help them see how important these vulnerabilities are the other huge benefit is that it can run across all of your code so it scales so much better than humans do humans can find things that machines can we can find those niche little bugs here and there but I’m not so good at Autocar you know auditing a million lines of code that’s something that machine’s better at doing and in talking about machines if we find out that the human is better at finding a certain niche bug we can always build that until intelligence into the tool we can extend it are there any drawbacks then well of course every process every tool they all have drawbacks false positives is definitely a drawback that many of these tools suffer from that is that they need a little bit of tuning they need a little bit of help from the human they may report especially if you try if you’ve not done this before you have a large code base and all the sudden you just BAM let’s run this across our entire code base you may get hundreds of renal alarms or alerts of areas in your code that could have honor abilities and if the developers have to go and check and they find out not all those are real they may become a little frustrated with that process we have to be a little careful about how you roll this out start smart start with maybe just new code have the developers look at that Thun down the false positives a little bit at first and then as you work and the tool is working better better than you can tune up and maybe go back and and retro actively scan your legacy code so using the tools wisely is just as important as what these tools can do for you of course they miss subtle bugs and we talked about that you know they may have a situation where the humans better at finding certain its design things or whatever that’s something that we can either add into the tool or we can catch with a totally different technique which we’ll talk about so again raising the bar of your entire code base is an important thing that static code analysis gives us let’s look at the whole cycle that so you establish your goals and upfront you decide this is the kind of thing i’d like to look for first for example if you ask a telco what’s the most critical bug we could find for you something that takes the system offline denial-of-service something like that very critical the most important thing that they don’t want to have their product doing on the other hand if you ask a desktop software browser office if word were to crash temporarily it’s not great you might lose a little data and you might irritate some users but at the end of the world they’re a little more concerned about office documents carrying viruses and being exploitable and that sort of thing though understanding what are your goals and your intentions for the review and then of course we want to run those tools and it’s best if we have a fully working environment you could run this as a pen test at the end against the code base without you know integrating into the build environment but that’s not the best place to do it and we’ve already talked about the fact that the best places right with the developer right where the coach compiled and then we’re going to review the findings and we’re going to make fixes and we’re going to continue doing this over and over again so i’ll talk about this let’s drill into this just a little bit more so here’s your source code we see that we give it to the machine basically it’s got to pre-install rules it’s got all this stuff and we can again extend those roles and then we get wrong results out of that the analysis tool and human needs to be involved in that again whether it’s you know chewing down false positives or maybe extending rules you’re making sure that what we found really are bugs and refile those bugs so that they get fixed and again motivating developers to make those fixes in a timely fashion that’s an important social part of this that we’re not going to go into a lot but generally speaking we shouldn’t find ourselves in the exploitability argument hates bugs important this one’s not we just need to make sure that we give a fair reading and say this bug pretty severe needs to get fixed pretty sure this was very severe need to fix right away as long as we have rough estimates i think developers can respond that they’re more exact ways with CSS and other things that we can do to give a better rating numbers but generally speaking if we’ve got bugs and we know about them and their security critical we need to make sure they get fixed alright so let’s move on to some of the exact techniques and and they vary across different languages you talk about auditing java code versus c++ versus C sharp there’s gonna be different techniques in ways that we find bugs and the cool thing about static analysis tool thing i really like about is most events hidden from the users the developers even the security people they don’t always necessarily need to know exactly what ever the tool used to find the budget as long as it’s finding good bugs and reporting them in a way that makes sense so some of the things we could talk about would be pattern matching the statistical analysis data flow analysis and procedure those are some there’s many more techniques that static code analysis tools used so let’s start with pattern matching because it’s the easiest one if we look at a piece of code and this looks like C code we see there’s a syslog function called from the my log routine and it looks like it’s missing something there’s this percent is and what is that that’s called the format string that’s something that tells the function how to format the user-supplied dated the message in this case if that’s missing there’s a vulnerability so it’s a very simple pattern matching you either provided a format string or you didn’t and if you didn’t there’s a vulnerability so now in many cases there’s been a lot of academic research around static code analysis and this put some very complex techniques to you know as far as following data and finding vulnerabilities all good stuff but in many cases a lot of the low-hanging fruit again raising the bar of your hook help codebase it doesn’t necessarily take super complex analysis it just needs to be automated and generally that’s repeatable in a way that we can fix to create safer code so let’s look at some others instead statistical analysis for example what that is is imagine you have a piece of code you copy it you paste it and you tweak a few variables that’s a very common thing that developers do all the time now in this case we can see that there’s this value returns that they had in the original and in the pasted version of that they changed most of the your portions of that to be another variable name locals except they forgot at the very end to change it to return and that’s something that we could pick up with statistical analysis data flow analysis is probably one of the best techniques for finding security critical bug statically and and let me show you an example of how that works so we have this idea of a source and a sink basically what that means is where data comes in and where it gets used your copies are ultimately goes out and typically we found untrusted data not just static data that lives in a global table or something but data that can come from a network packet from a file something that an attack might have rights to supply so if you look at this snippet of code we can see that we’re allocating with Malik some memory and we’re allocating four times D word sighs and so 4 times 4 is 16 and then looks like we’re doing a copy loop with mem copy and the source of the data comes from RV which in this case represents untrusted data from command line input and that gets copied to this stuff 7 which is the pointer to the allocated memory so we’re copying looks like four loops of 4 bytes apiece so it looks like we have enough memory that looks good to the human auditor they may think say hey this code looks fine in and especially if you’re auditing hundreds of thousands of lines of code it’s just tough to try to drill into these but dataflow analysis with a static analysis to should be able to fairly easily identify the fact that actually we had four times two plus two so it wasn’t in fact 16 by its allocated was only 10 and so that their you’re going to have a memory corruption bug right in the area of this mem copy loop and that’s something that we could point out with data flow analysis let’s talk about another technique called procedure intra and Inter procedure where we use solvers to find new pass through data and examine the use of objects or elements within those paths so for example here we have an object X we check to see if it’s no and if it’s not then we can use it for later on we use that same method but we forget to check that it’s an hour we do it later and that’s a mismatch that were able to find with that type of technique so what’s really cool is a lot of these static analysis tools you don’t have to understand all of those different techniques sort of the under the hood stuff you can typically go out and find a data sheet that says you know here’s all the different types of things we check for and bugs we can find in these different languages so that gives you an idea of the heavy lifting that these tools can do for you one of the other things i want to point out it’s not all about pure bugs not just how many you know this this tool find 95% bugs in this tool finds 93% of bugs it’s nice to know but the one that can be integrated more fully the one that’s more you fold the one that helps developers understand what those bugs really are so the repairs can be properly made that’s the tool that’s more useful so think about grouping and sorting results can still do that can eliminate unwanted results in other words can you tune down those false positives can does it explain the significance of finding so here’s the technique we use this is the bug and that that’s what this bug is and here’s some data about that type of stuff is very important to helping developers understand the importance and making the fix in a timely fashion so let me summarize this static code analysis section we want to find bugs sooner especially compared to field it crashes or real failures in the field that there’s no comparison the money you save the reputation that you save you know it’s just it’sit’s very huge return on investment basically you don’t have to go and explain your customers why this bad thing happened and so the other thing to summarize about the coach static code analysis again it’s something that we can integrate further to the left and in stl speak that means we look at that original stl picture that the further to the left we can integrate that the earlier we were able to make those pictures and it again decrease the cost of repairs and increase your reputation and your investment that the customers make when they buy your software it could be a real market differentiator if they find out that your software isn’t as secure as your competitors they might not by yourself anymore so I think there are wise pretty easy to measure for most of these

Tags: , , , , , , , , , , , , , , , , ,

7 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *