Training Magazine Network

Blogs

Improving Data Quality Through Anomaly Detection

By Mark Kasunic,Senior Member of the Technical Staff,Software Engineering Process Management Program Organizations run on data. They use it to manage programs, select products to fund or develop, make decisions, and guide improvement. Data comes in many forms, both structured (tables of numbers and text) and unstructured (emails, images, sound, etc.). Data are generally considered high quality if they are fit for their intended uses in operations, decision making, and planning. This definition implies that data quality is both a subjective perception of individuals involved with the data, as well as the quality associated with the objective measurements based on the data set in question. This post describes the work we’re doing with the Office of Acquisition, Technology and Logistics (AT&L)—a division of the Department of Defense (DoD) that oversees acquisition programs and is charged with, among other things, ensuring that the data reported to Congress is reliable. The problem with poor data quality is that it leads to poor decisions. This problem has been well documented by many researchers, notably by Larry English in his book Information Quality Applied. According to a report released by Gartner in 2009, the average organization loses $8.2 million annually because of poor data quality. The annual cost of poor data to U.S. industry has been estimated to be $600 billion. Research indicates the Pentagon has lost more than $13 billion dollars due to poor data quality. Data quality is a multi-dimensional concept and the international standard data quality model identifies 15 data quality characteristics, including accuracy, completeness, consistency, credibility, currentness, accessibility, compliance, confidentiality, efficiency, precision, traceability, understandability, availability, portability, and recoverability. In our data quality research, we have been focusing on the accuracy attribute of data quality. Within the ISO model, accuracy is defined as the degree to which data has attributes that correctly represent the true value for the intended attribute of a concept. Ensuring data quality is a multi-faceted problem. Errors in data can be introduced in multiple ways. Sometimes it’s as simple as mistyping an entry, but more complex organizational factors also lead to data problems. Common data problems include misalignment of policies and procedures with how the data is collected and entered into databases, misinterpretation of data entry instructions, incomplete data entry, faulty processing of data, and errors introduced while migrating data from one database to another. A number of software applications have been introduced in recent years to address data quality issues. Gartner estimates that the number of software tools available for data quality grew by 26 percent since 2008. The bulk of these applications, however, focus on problems with customer-relationship management (CRM) data, materials data, and financial data (for example reconciling duplicate records and missing and inconsistent data). As part of our research, we are going beyond these basic types of data checks using statistical, quantitative methods to identify data anomalies that are not addressed by current off-the-shelf data quality software tools. While available data quality automated platforms address erroneous data, these applications are intended for customer relationship management, materials processing, and financial accounting. The types of data errors that they are intended to find and correct include missing data, incomplete data, character mismatches, and duplicate records. Examples of the data anomalies that our research is focused on exposing include cost estimates and performance values that are unusual when compared to the time series values that constitute the remainder of the data series. These unusual data values are considered outliers and tagged as anomalies. A data anomaly is not necessarily the same as a data defect. A data anomaly might be a data defect, but it might also be accurate data caused by unusual, but actual, behavior of an attribute in a specific context. Root cause analysis is typically required to resolve the cause(s) of data anomalies. We are working with our DoD collaborators on the resolution process to determine if the anomalies detected are actual data defects. Our research is analyzing performance data submitted by DoD contractors in monthly reports about aspects of high-profile acquisition programs, including cost, schedule, and technical accomplishments on a project or task. Some methods that we are evaluating include Dixon’s Test Rosner’s Test Grubb’s Test Regression analysis Autoregressive integrated moving average (ARIMA) Various statistical control charting applications (individuals, moving range, moving average, exponentially weighted moving average, etc.) Various non-parametric approaches (kernel function based, histogram based) Slippage Detection These approaches to anomaly detection are being compared and contrasted to determine what specific methods work best for each EVM variable we are studying. Our data quality research complements our recent work on the Measurement and Analysis Infrastructure Method (MAID), which is an evaluation tool that helps organizations understand the weaknesses and strengths of their measurement systems. MAID is broader in scope than what is being addressed with our current research, recognizing that data is part of a life cycle that begins with sound definition, specification, collection, storage, analysis, packaging (for information purposes), and reporting (for decision-making). The integrity of data can be compromised at any of these stages unless policies, procedures and other safeguards are in place. Our research thus far has found a number of methods that have been effective for identifying anomalies in the EVM data. Our work will culminate in a report that we plan to publish by the end of 2011. With support from AT&L, we’re hoping these methods will identify problems in the data they receive and report, ultimately leading to better decisions made by government officials and lawmakers. Additional Resources For more information about the SEI’s work in measurement and analysis, please visit www.sei.cmu.edu/measurement/ To read the SEI technical report, Issues and Opportunities for Improving the Quality and Use of Data in the Department of Defense, please visit www.sei.cmu.edu/library/abstracts/reports/11sr004.cfm To read the SEI technical report, Can You Trust Your Data? Establishing the Need for a Measurement and Analysis Infrastructure Diagnostic, please visit www.sei.cmu.edu/library/abstracts/reports/08tn028.cfm To read the SEI technical report, Measurement and Analysis Infrastructure Diagnostic, Version 1.0: Method Definition Document, please visit www.sei.cmu.edu/library/abstracts/reports/10tr035.cfm To read the SEI technical report, Measurement and Analysis Infrastructure Diagnostic (MAID) Evaluation Criteria, Version 1.0, please visit www.sei.cmu.edu/library/abstracts/reports/09tr022.cfm

SEI . Blog .  Jul 27, 2015 02:59pm

Protecting Against Insider Threats with Enterprise Architecture Patterns

Andrew P. Moore, Insider Threat Researcher CERT The 2011 CyberSecurity Watch survey revealed that 27 percent of cybersecurity attacks against organizations were caused by disgruntled, greedy, or subversive insiders, employees, or contractors with access to that organization’s network systems or data. Of the 607 survey respondents, 43 percent view insider threat attacks as more costly and cited not only a financial loss but also damage to reputation, critical system disruption, and loss of confidential or proprietary information. For the Department of Defense (DoD) and industry, combating insider threat attacks is hard due to the authorized physical and logical access of insiders to organization systems and intimate knowledge of organizations themselves. Unfortunately, current countermeasures to insider threat are largely reactive, resulting in information systems storing sensitive information with inadequate protection against the range of procedural and technical vulnerabilities commonly exploited by insiders. This posting describes the work of researchers at the CERT® Insider Threat Center to help protect next-generation DoD enterprise systems against insider threats by capturing, validating, and applying enterprise architectural patterns. Enterprise architectural patterns are organizational patterns that involve the full scope of enterprise architecture concerns, including people, processes, technology, and facilities. This broad scope is necessary due to the fact that insiders have authorized access to systems—not only online access but physical access too. Our understanding of insider threat stems from a decade of experience cataloging more than 700 cases of malicious insider crime against information systems and assets, including over 120 cases of espionage involving classified national security information. Our experience reveals that malicious insiders exploit vulnerabilities in business processes of victim organizations as often as they do detailed technical vulnerabilities. Likewise, our data analysis has identified well over 100 categories of weaknesses in enterprise architectures that allowed the insider attacks to occur. We have used this analysis to develop an insider threat vulnerability assessment method, based on qualitative models for insider IT sabotage and insider theft of intellectual property (IP) that characterize patterns of problematic behaviors seen in insider threat cases. We have also applied these models to identify insider threat best practices and technical insider threat controls. For example, an organization must deal with the risk that departing insiders might take valuable IP with them. One set of practices and controls that helps reduce the risk of insider theft of IP is based on case data showing that most insiders who stole IP did so within 30 days prior to their forced or voluntary termination. The pattern describing this set of practices and controls helps balance the costs of monitoring employee behavior for suspicious actions with the risk of losing the organization’s intellectual property. Organizations aware of this pattern can ensure that the necessary agreements are in place (IP ownership and consent to monitoring), critical IP is identified, key departing insiders are monitored, and the necessary communication among departments takes place. At the point at which an insider resigns or is fired, technical monitoring and scrutiny of that employee’s activities within a 30-day window of their termination date are increased. Actions taken upon and before employee termination are vital to ensuring IP is not compromised and the organization preserves its legal options. Capturing our understanding of insider threat mitigations as architectural patterns allows us to translate effective solutions in forms useful to engineers who design DoD systems. As part of our research, we are analyzing the subset of insider IT sabotage cases from the CERT insider threat database. We are updating and refining our existing qualitative insider IT sabotage model to include a quantitative simulation capability intended to exhibit the predominate patterns of insider IT sabotage behavior. We are using a system dynamics approach to model and analyze the holistic behavior of complex problems as they evolve over time. System dynamics modeling and simulation makes it easier for us to understand and communicate the nature of problematic insider threat behavior as an enterprise architectural concern. After validating that simulating the problem model accurately represents the historical behavior of the problem—and does so for the right reasons—the next step is to examine the enterprise-level architectural insider threat controls proposed to help mitigate it. Our research will focus on two aspects: Are those controls effective against insider threats? For example, do the controls mitigate the problematic behavior exhibited in the simulation model? Do those controls introduce negative unintended consequences? For example, even if the controls are effective against the threat, do they unintentionally undermine organizational trust and reduce team performance? A key challenge in our research is the difficulty associated with testing these controls in an operational environment. One manifestation of this problem is in the form of unknown false positive rates associated with insider threat controls. From the perspective of technical observations and resource usage, most malicious insiders behave as their non-malicious counterparts do. We therefore expect that poorly-designed controls will overwhelm operators with false positives. Controls are also hard to test operationally because insider attacks occur relatively infrequently, but nevertheless result in huge damages for victim organizations. To meet these challenges, we are using system dynamics modeling and simulation to identify and test enterprise architectural patterns to protect against insider threat to current DoD systems. We are interviewing members of the DoD who have expressed interest in information security controls to mitigate the insider threat. These steps are enabling us to characterize the baseline enterprise architecture, which represents their operational architecture as a starting point for our analysis. Identified architectural patterns will be applied to modify the baseline architecture to better protect against insider threat. The basis for establishing the efficacy of the architectural patterns is system dynamics simulation-based testing. The experiments conducted in the simulation environment provide a body of evidence that supports strong hypotheses going into pilot testing within organizations. Enterprise architectural patterns developed through our research will enable coherent reasoning about how to design—and to a lesser extent implement—DoD enterprise systems to protect against insider threat. Instead of being faced with vague security requirements and inadequate security technologies, DoD system designers will have a coherent set of architectural patterns they can apply to develop effective strategies against insider threat in a more timely and confident manner. Confidence in these patterns will be enhanced through our use of established theories in related areas and the scientific approach of using system dynamics simulation models to test key hypotheses prior to pilot testing. We expect our research results will improve DoD enterprise, system, and software architectures to reduce the number and impact of insider attacks on DoD information assets. We will be periodically blogging about the progress of this work. Please feel free to leave your comments below and we will reply. Additional Resources: For more information about the work of the CERT Insider Threat Center, please visitwww.cert.org/insider_threat/ To read a report about preliminary technical controls derived from insider threat data, Deriving Candidate Technical Controls and Indicators of Insider Attack from Socio-Technical Models and Data, please visit www.cert.org/archive/pdf/11tn003.pdf To read a report about our insider threat modeling, A Preliminary Model of Insider Theft of Intellectual Property, please visit www.cert.org/archive/pdf/11tn013.pdf To read the CERT Insider Threat blog, please visit www.cert.org/blogs/insider_threat/

SEI . Blog .  Jul 27, 2015 02:59pm

The Growing Importance of Sustaining Software for the DoD

Part 2: SEI R&D Activities Related to Sustaining Software for the DoD By Douglas C. Schmidt, Deputy Director, Research, and Chief Technology Officer Software sustainment is growing in importance as the inventory of DoD systems continues to age and greater emphasis is placed on efficiency and productivity in defense spending. In part 1 of this series, I summarized key software sustainment challenges facing the DoD. In this blog posting, I describe some of the R&D activities conducted by the SEI to address these challenges. Primary Sustainment ActivitiesThe term software sustainment is often used synonymously with software maintenance. Sustaining software for the DoD, however, requires attention to certain issues (such as operations and training) that are less essential in commercial software maintenance. There are four primary categories of software sustainment activities: Corrective sustainment diagnoses and corrects software errors after release. Perfective sustainment upgrades existing software to support new capabilities and functionality. Adaptive sustainment modifies software to interface with changing environments. Preventive sustainment modifies software to improve future maintainability or reliability. SEI Sustainment R&DThe software engineering research community has devised various approaches to improve software sustainment. For example, tools for detecting software modularity violations help identify eroding design structure (referred to whimsically as "bad code smells") so the code can be refactored to enhance its sustainability. Likewise, intelligent automated regression testing frameworks help ensure that changes to legacy software work as required and that unchanged parts have not become less dependable. SEI sustainment strategies. Over the past several decades, the SEI has created methods and guidelines for sustaining, migrating, and evolving legacy systems. For example, the SEI has devised strategies for modernizing legacy systems and reusing legacy components in service-oriented architecture (SOA)-based systems. These strategies employ risk-managed, incremental approaches that encompass changes in software technologies, engineering processes, and business practices. In addition, the SEI has created techniques for measuring the effectiveness of software sustainment practices. These techniques can be used to help decision-makers choose a course of continued sustainment, replacement, or selecting which redundant legacy systems to keep and which to retire. Software product lines. Legacy DoD systems comprise a wide range of software variations, such as network, hardware, and software configurations; different algorithms; and different security profiles. This variation is a key driver of total ownership costs because it impacts the time and effort required to assure, optimize, and manage system deployments and configurations throughout the lifecycle. To manage this variation effectively, the SEI helped pioneer software product lines (SPLs), which have been applied in DoD systems to manage software variation while reusing large amounts of code that implement common features needed within a particular domain. Software sustainment costs (particularly SPL testing) for an SPL-based family of systems can be reduced because reusable components in the SPL are maintained and validated in one place, instead of separately within each application. Team Software Process. The Team Software Process (TSP) is another approach pioneered by the SEI that managers and engineers can use to sustain legacy software projects. TSP is a team-centric, time-boxed approach to developing software. By using TSP, organizations can better plan, measure, and improve software development productivity so they have more confidence in sustainment quality and cost estimates. The U.S. Air Force and other DoD and industry organizations have applied TSP successfully to manage software sustainment in large-scale weapons systems for the U.S. Air Force, as well as other DoD and industry organizations. Software architecture. The SEI has also focused extensively on software architecture, which comprises the structure of the software elements in a system, the externally visible properties of those elements, and the relationships among them. SEI research has shown that a solid understanding of software architecture—and the associated methods, infrastructure, and tools—is essential to modify and improve software-reliant systems correctly, dependably, rapidly, and cost effectively throughout the lifecycle. Likewise, successful sustainment of software-reliant DoD systems requires techniques and tools for evaluating and improving software engineer and manager competence with respect to software architecture, including the following: Understanding, analyzing, and engineering tradeoffs among system properties (such as performance, dependability, and security) that are critical to achieving desired levels of quality in software-reliant systems as they evolve. These properties are quality attributes that determine system viability throughout the sustainment phase. Using architecture-centric practices to elicit quality attribute requirements and to design and analyze changes that are needed throughout sustainment of systems at all scales. Architecture-centric practices can be used to plan system releases and address sustainment challenges pertaining to integration and operational problems due to inconsistencies between system and software architectures. Applying architecture principles for systems-of-systems and ultra-large-scale systems to develop architecture design and analysis principles that help document and account for socio-technical interactions, decentralized control, and continuous evolution and sustainment environments where failures/changes are the norm. For example, some soldiers or support staff on the battlefield are capable of creating or modifying existing systems in response to needs that were not anticipated by the designers of the original systems. SEI assessments, workshops, and red teaming. The SEI regularly works with DoD programs to conduct independent technology assessments, reviews, and "red teams" that apply many of the methods and approaches described above to review the planning for—and conducting of—sustainment of DoD systems. For example, architecture practices such as the Architecture Tradeoff Analysis Method (ATAM) can help DoD programs elicit stakeholder input to identify likely long-term sources of change throughout the sustainment phase. The SEI’s experience helping DoD programs transition from the production phase of acquisition to the sustainment phase of acquisition indicates that the DoD often focuses on how its contracts and contractors will change rather than on how its program offices will need to change. The SEI helps acquisition programs plan for these transitions to sustainment and has collected lessons learned from these activities into software acquisition planning guidelines (including Guideline #4: Software Sustainment). An interesting trend is that DoD programs are increasingly interdependent and interoperable, leading to sustainment interdependencies that require new coordination. To address this need, the SEI developed interoperable acquisition workshops to bring program offices together and draft plans that address sustainment. Information assurance and software security. Increasing requirements for interdependence and interoperability also yield new challenges for information assurance and software security in legacy systems. In particular, many legacy systems were developed as isolated enclaves. With the advent of net-centric systems-of-systems, however, these legacy enclaves are being interconnected in ways that subject them to vulnerabilities not anticipated by their original designers. For example, legacy systems programmed in languages like C may be susceptible to buffer overflows that will not occur until they are connected to a network. Moreover, maintainers may not resolve these types of vulnerabilities correctly. They might, for instance, simply add input validation to eliminate a particular path to a buffer overview vulnerability rather than remove the out-of-bounds write. The CERT Secure Coding Team works with developers and maintainers to eliminate these and other types of vulnerabilities by establishing secure coding standards and processes for conformance testing against these standards. Likewise, the CERT Vulnerability Analysis Team can use an analysis of vulnerabilities based on secure coding rule violations to help handle the response. Legacy software systems can also undergo conformance testing against a secure coding standard in the CERT Source Code Analysis Laboratory (SCALe) to detect and eliminate vulnerabilities before the software is deployed. SCALe has also been used by DoD program offices to access the quality of legacy code to inform modernization versus replacement decisions. Related SEI Blog Posts SEI researchers have written several blog postings that are relevant to the sustainment of software-reliant DoD systems. For example, Rick Kazman’s posting on Measuring the Impact of Explicit Architecture Documentation focused on understanding the value of documenting software architectures for complex, software-reliant systems. Thorough software architecture documentation helps engineers who sustain DoD software understand how they can refactor, maintain, and update the software without introducing new defects or degrading existing capabilities. Ipek Ozkaya’s posting on Enabling Agility by Strategically Managing Architectural Technical Debt examined how metrics extracted from the code and module structures of software can help repay technical debt, which is a conceptual framework for understanding how and when to defer design choices during the planning or execution of a software project. Repaying technical debt via refactoring and re-architecting is an effective strategy to alleviate architectural dependencies that impact system-wide architectural rework and minimize software decay during sustainment. Steve Rosemergy’s posting on A Framework for Evaluating Common Operating Environments described a framework for exploring the interdependencies among common language, business goals, and software architecture when evaluating the sustainability of proposed software solutions. We Want to Hear Your ThoughtsThis post has just scratched the surface of the solutions that meet the challenges of sustaining software-reliant DoD systems. While the SEI has expertise in methods and tools related to software sustainment, the DoD faces deeper and broader challenges than any one organization (or blog post) can address. We welcome your feedback in the comments section below on ways to improve the technologies and ecosystems needed to sustain DoD software effectively. Additional Resources: More information about sustaining software-reliant DoD systems is available below. To read about software sustainment practices for the DoD, please visit www.stsc.hill.af.mil/resources/tech_docs/gsam4.html, especially chapter 16. To read about the SEI’s work in software architecture, please visitwww.sei.cmu.edu/architecture To read about the SEI’s work with the Team Software Process (TSP), please visit www.sei.cmu.edu/tsp To read about the SEI’s work in Software Product Lines, please visitwww.sei.cmu.edu/productlines To read about the SEI’s work in system of systems and SOA, please visitwww.sei.cmu.edu/sos To read about the SEI’s work on Ultra-Large-Scale Systems, please visit www.sei.cmu.edu/uls To read about the SEI CERT’s work in secure coding, please visitwww.cert.org/secure-coding/

SEI . Blog .  Jul 27, 2015 02:59pm

Common Infrastructure and Joint Programs, Fourth in a Series

By Bill Novak, Senior Member of the Technical Staff, SEI Acquisition Support Program, Air Force Team This is the fourth in an ongoing series examining themes across acquisition programs. Background: Over the past decade, the U.S. Air Force has asked the SEI’s Acquisition Support Program (ASP) to conduct a number of Independent Technical Assessments (ITAs) on acquisition programs related to the development of IT systems; communications, command and control; avionics; and electronic warfare systems. This blog posting is the latest installment in a series that explores common themes across acquisition programs that we identified as a result of our ITA work. Previous themes explored in this series include Misaligned Incentives, The Need to Sell the Program, and The Evolution of "Science Projects." This post explores the fourth theme: common infrastructure and joint programs, which describes a key issue that arises when multiple organizations attempt to cooperate in the development of a single system, infrastructure, or capability that will be used and shared by all parties. The Fourth Theme: Common Infrastructure and Joint Programs This theme focuses on joint programs, which are popular for the potential they offer to reduce costs and improve interoperability. Joint programs are also recognized, however, as being hard to manage successfully due to many different reasons, including the number of stakeholders, the organizational size and complexity, differing organizational goals, interoperability challenges, geographical separation, coordination overhead, communication issues, and other factors. There are other types of programs that may not technically be joint programs, but which have similar characteristics For example, a common infrastructure system, such as an enterprise-wide IT system, is similar to a joint program. Both are often trying to replace a set of isolated, yet related, existing capabilities with a single new system that will offer an integrated capability that is the union of the existing capabilities—and in the process is both modernizing the capability, as well as making it more efficient to develop and maintain. To explore the issues of common infrastructure and joint programs more closely, consider a scenario that aggregates together the experiences of some joint programs the SEI has worked with: A joint program office has several stakeholder programs that are planning to use the joint infrastructure software being developed, but each program demands that at least one major feature be added to the software just for them. The joint program manager agrees to the additional requirements, for fear of losing stakeholders (who could always build their own custom software). The additional design and coding changes that are needed significantly increase the total program cost, schedule, complexity, and risk. As the schedule now begins to slip, one program decides to leave the joint program and develop its own custom software instead. With one stakeholder gone, the amortized costs for the other programs increase further—and so another program leaves. As cost escalates, participation in the joint program begins to unravel and may ultimately collapse. Many problems we’ve seen in acquisition programs belong to a category known as "social dilemmas" where planned cooperation can turn into opposition. Garrett Hardin’s article "The Tragedy of the Commons" (1968) is one of the most famous types of social dilemmas (the scenario above is such an example). The "Tragedy of the Commons" can be summed up simply: an individual desires an immediate benefit that will cost everyone else—and if all succumb to the same temptation, everyone is worse off. In the case of the joint program, the stakeholders each want custom features—but if they all demand them, it drives up cost, schedule, and risk, and everyone is worse off. Social dilemmas are inherently hard to fix, which is why they persist not only in acquisition, but also in aspects of public policy, economics, sociology, and many other areas. Nonetheless, researchers have identified a range of solutions and mitigations that can be applied. For example, one approach for resolving many instances of the "Tragedy of the Commons" dilemma is privatization, which removes the social aspect of a social dilemma by converting shared ownership (with diffused responsibility) into private ownership (with sole responsibility), so that each owner now has a strong incentive to properly care for what they own. Privatization, however, may defeat the intent of achieving the original objectives (in this case cost savings and interoperability) through cooperation. In the joint program scenario, for example, it would mean that each of the stakeholder programs would build their own custom system, which can be prohibitively costly and time consuming. An alternative solution might be "altruistic punishment," where cooperating participants can penalize uncooperative participants in some way, to encourage them to cooperate—even if the penalty costs the cooperators, and may produce no immediate direct gain for them. The cost of imposing the penalty prevents its overuse, making it self-correcting. Research by Fehr and Gachter has found that cooperation flourishes when altruistic punishment is present, and can break down if it is not. Altruistic punishment might incentivize stakeholder programs to stay with the joint program, despite the difficulties. If it were unsuitable in a given situation, such as a joint program, other solutions to the "Tragedy of the Commons" dilemma still exist, including assurance contracts, rewards and penalties, building trust, and exclusion mechanisms. Elinor Ostrom’s Nobel prize in Economics in 2009 acknowledged her extensive work on how people create successful institutions to manage common resources. The choice of the best solution will depend on the specific circumstances of the program. The SEI is exploring ways to model acquisition program behavior, such as the joint program scenario discussed above, to help analyze, predict, and ultimately manage the effects of various specific solution approaches on program outcome. As this work progresses, a key aspect will be how to best leverage this work in a form that's most helpful to the acquisition community. We know that acquisition leaders may be inexperienced with certain types of decision-making and may also be unfamiliar with some unique complexities of software-reliant acquisition programs—especially joint programs. Moreover, we know that conventional training may not be fully effective in preparing decision-makers for dealing with dynamically complex domains. What acquisition leaders need is experience in complex decision-making, such as they might develop over decades of experience with actual acquisition programs. To accelerate this learning process, we plan to create interactive experiential learning tools, which are essentially "flight simulators" for acquisition professionals that address these types of situations. These learning tools are key since actively learning through experience produces better understanding and superior retention of the knowledge. With such an approach, we believe it will be possible to improve the decision-making abilities of acquisition program staff, thereby achieving more successful program outcomes. Additional Resources: For more information, about the SEI's Acquisition Support Program, please visit www.sei.cmu.edu/acquisition.

SEI . Blog .  Jul 27, 2015 02:58pm

Using Machine Learning to Detect Malware Similarity

By Sagar Chaki, Senior Member of the Technical Staff Research, Technology, and System Solutions Malware, which is short for "malicious software," consists of programming aimed at disrupting or denying operation, gathering private information without consent, gaining unauthorized access to system resources, and other inappropriate behavior. Malware infestation is of increasing concern to government and commercial organizations. For example, according to the Global Threat Report from Cisco Security Intelligence Operations, there were 287,298 "unique malware encounters" in June 2011, double the number of incidents that occurred in March. To help mitigate the threat of malware, researchers at the SEI are investigating the origin of executable software binaries that often take the form of malware. This posting augments a previous posting describing our research on using classification (a form of machine learning) to detect "provenance similarities" in binaries, which means that they have been compiled from similar source code (e.g., differing by only minor revisions) and with similar compilers (e.g., different versions of Microsoft Visual C++ or different levels of optimization). Evidence shows that a majority of malware families generate from the same origin. For example, a 2006 Microsoft Security Intelligence report revealed that the 25 most common families of malware account for more than 75 percent of the detected malware instances. Compounding this problem is the fact that the current cadre of malware analysis tools consists of either manual techniques (requiring extensive time and effort on the part of malware analysts) or automated techniques that are not as accurate (they produce high false-positive or false-negative rates) or are inefficient. In contrast, our approach involves creating a training set using a sample of binaries using the training set to learn (or train) a classifier using the classifier to predict similarity of other binaries I, along with my colleagues—Arie Gurfinkel, who works with me in the SEI’s Research, Technology, and System Solutions Program, and Cory Cohen, a malware analyst with CERT—felt that classification was appropriate for evaluating a binary similarity checker because this form of machine learning is particularly appropriate in instances where closed-form solutions are hard to develop, and a solver can be "trained" using a training set composed of positive and negative examples. While malware classification is a major aim of provenance-similarity research, there are two main hurdles to applying classification directly to malware binary similarity checking: Classification must be applied to parts of the malware where similarity is expected to manifest most directly. For this research, we decided to apply classification to functions. Intuitively, a function is a fragment of a binary obtained by compiling a source-level procedure or method. Functions are the smallest externally identifiable units of behavior generated by compilers. Similarity at the function level is an indicator of overall similarity between two binaries. For example, malware that originated from the same family will rarely be identical everywhere. Instead they will share important functions. It is hard to develop training sets from malware due to the lack of information on source code and generative compilers. Our research therefore focuses on evaluating open-source software. We believe that a classifier that effectively detects provenance-similarity in open-source functions will also be effective on malware functions because the variation we are targeting (due to changes in source code and compilers) is largely independent of the software itself. For example, the variation introduced by a different compiler version (e.g., introducing stack canaries to detect buffer overflows at runtime) is the same, regardless of whether the source code being compiled is malware or open-source. More specifically, we selected approximately a dozen C/C++ open-source projects from SourceForge.net and compiled them to binaries using Microsoft Visual C++ 2005, 2008, and 2010. We then extracted functions from the binaries using Idapro, which is a state-of-the-art dissembler, and constructed a training set and a testing set from the functions using a tool that we developed atop the Rose compiler infrastructure. Next, we learned a classifier from the training set using the Weka framework. When it comes to classification, the following two main decisions must be considered: What classifier are you going to use? What kind of attribute are you going to use? We measured the effectiveness of a classifier in terms of two quantities: (1) its F-measure, which is a real number between 0 and 1 that indicates the overall classifier accuracy, and (2) the time required to train the classifier. There is a tradeoff between the two quantities: an F-measure can be increased by using a larger training set, but the training time also increases. We empirically found that the RandomForest classifier was the most effective Weka classifier for our purposes since it has the best F-measure for the same training time. We repeated the experiment several times with different randomly constructed training and testing sets. To determine the robustness of our results, we repeated our experiments using a different set of open-source software and different versions of Microsoft Visual C++. The results were consistent in all cases, with the F-measures being around 0.95 for RandomForest. This finding is encouraging since it indicates that a provenance-similarity detector based on RandomForest will produce the correct result in more than 95 percent of the cases. We believe that this accuracy is sufficient for use in practical malware analysis situations. Next, we experimented with various parameters of RandomForest to observe how these parameters affect the tradeoff between its F-measure and its training time. In particular, we focused on two important parameters: the number of trees and the number of attributes. With each attribute, we experimented with different values and measured how the F-measure vs. training time tradeoff changed. To further improve and evaluate our approach, we developed a suite of the following types of attributes: Semantic attributes, which capture the effect of a binary’s execution on specific components of the hardware state, register, and memory locations. Syntactic attributes, which are derived from n-grams and n-perms and represent groups of instruction opcodes that occur contiguously in the library. We re-evaluated the effectiveness of the classifier using these two types of attributes and concluded that semantic attributes yield better F-measures, but are more expensive to compute than syntactic attributes. Attribute extraction is inherently parallelizable, however, since it is done independently for each function. A rough estimate is that a modern CPU can extract semantic attributes from about 10,000 functions in the CERT catalog every day. Based on this estimate, extracting attributes from malware samples as they are discovered each day is feasible with a modestly sized CPU farm. We had several false steps along the way. For example, we originally used text files for all of our input and output, which was slow and unwieldy. We therefore decided to store inputs and outputs in a database, which simplified our tools and accelerated our experiments. Another lesson learned was to handle statistical issues and randomness carefully. Since the set of all possible training and testing samples is large, we had to pick random subsets for our experiments. In some cases, we also had to label the samples in a random—yet deterministic—manner so that each sample had a randomly assigned label that stayed the same across all experiments. Constructing a labeling scheme that was both random and deterministic required extra care. While determining the similarities between binary functions remains a challenge, the preliminary results from our research were presented in a well-received paper at the 2011 Knowledge, Discovery, and Data Mining Conference. Our malware research has also studied fuzzy hashing and sparse representation. Our future research will explore other ways of detecting similarities between functions, including the use of static analysis. Additional Resources: For additional details, or to download benchmarks and tools that we have developed and are using as part of our project, please visit www.contrib.andrew.cmu.edu/~schaki/binsim/. To listen to the CERT podcast, Building a Malware Analysis Capability, please visit www.cert.org/podcast/show/20110712gennari.html To read other SEI blog posts relating to our malware research, please visit http://blog.sei.cmu.edu/archives.cfm/category/malware

SEI . Blog .  Jul 27, 2015 02:58pm

A Collaborative Method for Engineering Safety- and Security-Related Requirements

By Donald Firesmith, Senior Member of the Technical Staff Acquisition Support Program This blog post is the third and final installment in a series exploring the engineering of safety- and security-related requirements. Background: In our research and acquisition work on commercial and Department of Defense (DoD) programs, we see many systems with critical safety and security ramifications. With such systems, safety and security engineering are used to managing the risks of accidents and attacks. Safety and security requirements should therefore be engineered to ensure that residual safety and security risks will be acceptable to system stakeholders. The first post in this series explored problems with quality requirements in general and safety and security requirements in particular. The second post, took a deeper dive into key obstacles that acquisition and development organizations encounter concerning safety- and security-related requirements. This post introduces a collaborative method for engineering these requirements that overcomes the obstacles identified in earlier posts. Anyone involved in building safety- and security-critical systems needs to consider the following: Are you building a safety-critical system or one that must be secure from attack? Do your safety and security engineers begin their work only after the architecture is engineered, rather than building it in from the start via safety- and security-related requirements? Do your safety and security engineers develop their work products (documents and models) independently of each other and requirements engineers? Do your requirements specifications largely ignore safety, security, or both? Are many of your safety and security requirements so general that they are meaningless, such as "The system shall be safe and secure from attack?" Are most of your safety- and security-related requirements merely architecture and design constraints that prevent safety- and security-engineers from collaborating with architects to create innovative solutions? Is use-case modeling or structured analysis your primary or only requirements-analysis method, even when engineering safety- and security-related requirements? If you answer yes to any of these questions, then your safety, security, and requirements engineers can benefit from a better way of engineering their requirements. To achieve this goal, an appropriate safety- and security-requirements analysis method is needed. We propose using the Engineering Safety- and Security-related Requirements (ESSR) method, which consists of the following analysis-based tasks. Stakeholder analysis determines the stakeholders who have a vested interest in the safety and security of the system and the appropriate sources for eliciting safety and security goals and requirements. Safety- and security-engineers collaborate to identify the safety- and security-related stakeholders in the system and the assets that the system must defend from accidental and malicious harm. These stakeholders are modeled by producing stakeholder profiles and creating an initial partial list of the stakeholder’s safety- and security-goals. Asset analysis determines the assets that must be protected from unauthorized harm and the harm that these assets must be protected from. Safety- and security-engineers collaborate to identify the assets that the system must protect from harm. They model each defended asset by categorizing it, determining its value, identifying the types and severities of harm that it may suffer, and determining its stakeholders. Abuse analysis examines the ways that the system and the assets for which the system is responsible can be abused. Specifically, this task identifies the different types of abuses including safety mishaps (accidents and safety incidents) and security misuses (attacks and security incidents) that can occur. Abuse analysis also identifies which assets that the abuses can harm, in what manner, and to what degree. Safety- and security-engineers model these abuses using appropriate techniques (e.g., abuse case modeling, attack trees) and create abuse profiles. Vulnerability analysis determines the existence of the system-internal weaknesses or defects that can enable abuses (mishaps and misuses) to occur. Safety- and security engineers identify the credible potential system-internal vulnerabilities (e.g., defects and weaknesses) that could enable the abuses that may harm the defended assets. They also model these vulnerabilities using appropriate techniques such as STAMP-Based Process Analysis (STPA), Event Tree Analysis (ETA), Fault Tree Analysis (FTA), or Failure Modes and Effects Analysis (FMEA). Abuser analysis determines the system-external people and things that can accidentally or maliciously abuse the system and the assets that it must defend from unauthorized harm. Safety- and security engineers identify the credible potential abusers that could exploit the vulnerabilities and thereby cause the abuses that may harm the defended assets. They model these abusers using appropriate techniques (e.g., STPA, abuse case modeling, task analysis, or user profiling). Danger analysis determines the dangers (i.e., safety hazards and security threats), which are cohesive sets of conditions involving the existence of abusers, vulnerabilities, and assets that could increase the probability of abuses occurring. When restricted to safety and security, danger analysis is often called hazard analysis or threat analysis, even though they typically include all of these types of analysis. Safety- and security engineers model these safety hazards and security threats using appropriate techniques (e.g., operator task analysis, ETA, FTA, and FMEA). Risk analysis determines the maximum acceptable residual safety and security risks as well as the specific types of assets, harm, vulnerabilities, abusers, and dangers that are associated with these risks. Safety- and security engineers model these risks using appropriate techniques (e.g., calculating risk level as the product of probability times harm severity, using degrees of software control instead of probabilities, and risk matrices). Safety- and security-significance analysis identifies the goals and requirements that have safety and security ramifications so the corresponding parts of the system can be implemented using a process having the appropriate level of rigor and completeness, e.g., to justify the use of a more powerful (and therefore more expensive) development process. Safety- and security engineers categorize requirements into safety/security assurance levels (SALs), such as safety-critical and security-critical, based on the degree to which the requirements have safety and security ramifications. They collaborate with requirements engineers to update the requirements repository by annotating requirements with their SALs. Based on how these categorized requirements are allocated to architectural components, they assign the components safety/security evidence assurance levels (SEALs) that determine the degree of completeness and rigor to be used when architecting, designing, implementing, integrating, and testing these components. In other words, components with high SEALS should be as small as practical to minimize the increased effort, cost, and schedule needed to develop them. Finally, they update the certification repository with the results of safety- and security-significance analysis. Defense determination determines the appropriate defenses (i.e., controls including safeguards and security countermeasures) that are needed to defend the system and its associated defended assets from unauthorized harm. Safety- and security engineers perform a gap analysis to identify potential new defenses. They then evaluate these potential defenses using appropriate techniques (e.g., engineering analyses, product and vendor trade studies). Where appropriate (except for the safety- and security-significance analysis task), safety- and security engineers create safety and security goals for each type of analysis and then collaborate with the requirements engineers to transform these goals into requirements to prevent, detect, and react to it. They then update the certification repository with the results of the analysis. Also, where appropriate, they collaborate with requirements engineers to transform these informal restraints into official requirements. Finally, where appropriate, this information is stored in the certification repository to eventually support the system’s safety and security accreditation and certification. The above tasks result in the engineering of multiple types of associated safety and security requirements (e.g., prevention, detection, and reaction requirements as well as safety and security constraints). All such possible requirements, however, are rarely appropriate for most systems. The harm severity and likelihood of the associated mishaps and misuses may not justify the cost of producing and using the resulting safety- and security-defenses. Some requirements make others unnecessary, e.g., a requirement preventing the existence of a vulnerability may eliminate the need for a requirement to prevent an abuse enabled by that vulnerability. On the other hand, high-level requirements associated with the early analysis steps (e.g., prevent harm to a defended asset) may be used to derive lower-level requirements associated with later analyses steps (prevent vulnerability that enables abuse to harm the defended asset). The tasks of ESSR described above are best performed in an evolutionary (i.e., incremental, iterative, and concurrent) manner. Due to the evolutionary nature of ESSR, the temporal ordering of the preceding sequence of analyses is merely a logical simplification to improve understandability; a waterfall approach to safety and security is neither intended nor recommended. Safety, security engineers, and requirements engineers should also perform these tailorable tasks in a collaborative manner. At the end of this process, comprehensive safety and security analyses will have been performed and documented, safety and security goals will have been turned into their corresponding requirements, and the certification repository will contain the analysis- and requirements-related safety and security evidence needed for accreditation and certification. The preceding ESSR method for collaboratively engineering safety- and security-related requirements is described in considerably more detail in tutorials, a class, and a book to be published early in 2012. Additional Resources: For more information, please visit www.sei.cmu.edu/library/abstracts/presentations/icse-2010-tutorial-firesmith.cfm

SEI . Blog .  Jul 27, 2015 02:57pm

Developing Architecture-Centric Engineering Within TSP

By Felix Bachmann, Senior Member of the Technical Staff, Research, Technology, and System Solutions Bursatec, the technology arm of Groupo Bolsa Mexicana de Valores (BMV, the Mexican Stock Exchange), recently embarked on a project to replace three existing trading engines with one system developed in house. Given the competitiveness of global financial markets and recent interest in Latin American economies, Bursatec needed a reliable and fast new system that could work ceaselessly throughout the day and handle sharp fluctuations in trading volume. To meet these demands, the SEI suggested combining elements of its Architecture Centric Engineering (ACE) method, which requires effective use of software architecture to guide system development, with its Team Software Process (TSP), which teaches software developers the skills they need to make and track plans and produce high-quality products. This posting—the first in a two-part series—describes the challenges Bursatec faced and outlines how working with the SEI and combining ACE with TSP helped them address those challenges. ChallengesThe team of Bursatec software architects faced a significant challenge in designing their new trading system: only one team member had significant experience in designing a financial software system. We felt the ACE methods would help the team better understand what software architecture means, particularly when thinking about abstractions and solving quality attribute problems. Another complicating factor was that Bursatec wanted to combine stock market trading with derivative market trading on the same platform to reduce operating costs and provide a single, high-throughput, low-latency, high-confidence interface to external financial markets. Getting Started One of our first steps was to conduct a Quality Attribute Workshop in which the Bursatec stakeholders defined the five most important quality attribute requirements (also known as quality attribute scenarios) their new trading system had to fulfill. To guide the system design, the Bursatec architecture team used the Architecture Driven Design (ADD) method. ADD is a decomposition method based on transforming quality attribute scenarios into an appropriate design. Not surprisingly, given the importance of speed for the new system, the stakeholders identified runtime performance as one of the most important quality attribute scenarios. The performance quality attribute scenario coupled with high availability requirements, led the team to realize that conventional approaches, such as a three-tier architecture, were not the best solution for their new system. Consequently, the architecture team spent the next two weeks exploring various solutions, as well as the potential negative outcomes of each proposed solution. At the end of the two weeks, using rigorous Architecture Tradeoff Analysis Method (ATAM) techniques, the team of Bursatec architects had to present its findings—as well as evidence (including measures) that its chosen approach was correct—to an SEI software architecture coaching team that challenged each scenario. Every subsequent two weeks, the Bursatec software architects had to present solutions with appropriate evidence for the scenarios they had created. For example, with respect to the performance requirement, the team demonstrated how a stock order would traverse the system, estimating and measuring the timing required for every step. With each review, SEI coaches identified risks associated with a particular approach. For example, the team identified one risk with respect to performance: synchronizing with backup systems would throw off the timing. In all, there were three iterations of the architecture, each lasting six weeks. At the start of the second iteration, the SEI software architecture coaches brought in the team of Bursatec developers to begin working on prototypes, specifically focusing on risks (such as the timing of querying complex data structures) that could not be addressed solely via software architecture. This important step allowed developers to deeper their understanding of the architecture and familiarize themselves with the problem, which was a lengthy process. The developers had six weeks to implement the prototypes; at the beginning of the third iteration the developers returned and presented their results to the architecture team. This process enabled the architects to finalize their architecture design using the results from the prototypes. An interesting benefit to this style of architectural coaching was that the Bursatec architects used Enterprise Architect (a Unified Modeling Language-based tool ) from the onset to document, evaluate, and justify their solutions. Although architecture documentation is often an afterthought, it became second nature to the Bursatec architects. The architects focused only on the documentation that was either needed to provide sufficient evidence that the system would support the quality attribute requirements or required by the developers to effectively implement prototypes and the subsequent system. Improving Delivery with TSPThe Team Software Process (TSP) is a team-centric approach to developing software that enables organizations to better plan and measure their work, and improve software development productivity to gain greater confidence in quality and cost estimates. Our coaches emphasized the incorporation of TSP principles throughout the architecture design process with Bursatec. The use of TSP enabled the Bursatec architects to prepare, estimate, and track their work. In this case, the Bursatec architects were also able to time-box their iterations, an approach that the SEI finds effective. These activities initially proved challenging because TSP is oriented more toward programming, so the measures employed by developers typically apply to lines of code, classes, requirements pages, or other tangible, implementation-oriented measures. To create a measureable work unit, the Bursatec architects used the quality attribute scenarios as a size measure. The SEI architecture team recognized that each quality attribute scenario would be refined into about five more detailed quality attribute scenarios that address special cases. The SEI team also recognized that the Bursatec architects would have to create at least three to five diagrams and descriptions to fulfill each scenario. The Bursatec team then estimated how long it would take to create each diagram with a description. We found this measure proved a good tool for determining how long it would take to complete the architecture, which has been hard to estimate in our prior work with organizations. This approach to measuring and estimating work allowed the Bursatec architects to provide accurate estimates of deadlines to their management team. Integrating the ACE architecture within the TSP management process gave the Bursatec architects an effective framework in which to work. While it did restrict some of their freedom, it also proved helpful. For example, the architects’ work was structured into iterations, each with different goals. The first iteration focused solely on discovering problematic areas of the system based on achievement of the necessary quality attribute scenarios. In subsequent iterations the architects gradually added details to the system design to include support of all quality attribute scenarios. This iterative method enabled them to create a software architecture organically, for the whole system, that was well understood, justified, and accepted by the team. Building and Evaluating the SystemOnce the architecture was complete, the SEI architecture coaching team conducted an active design review in which the Bursatec architects communicated the entire architecture to the developers in a structured way. Next, conformance reviews were conducted during which the developers needed to provide evidence to the architects that the systems they were building conformed to the architecture. These reviews reinforced that the whole system would meet the needs of the stakeholders. To date, the development of the new trading system for Bursatec has progressed on schedule and within budget. Moreover, early tests confirmed that the trading system performance far exceeded expectations. The combination of TSP and ACE proved an ideal approach for the development of the trading system. TSP brought discipline and measurement, while ACE provided a set of robust architectural techniques that focus on business goals and quality requirements. Both approaches together support the whole development lifecycle, emphasizing business and quality goals, engineering excellence, defined processes, process discipline and teamwork. This post is the first in a two-part series describing our recent engagement with BMV. The next post focuses on the TSP framework that provided planning, scheduling, estimation, and tracking in the project. Additional Resources: For more information about the SEI’s work in Architecture Centric Engineering (ACE), please visitwww.sei.cmu.edu/about/organization/rtss/ace.cfm For more information about the SEI’s work in the Team Software Process (TSP), please visitwww.sei.cmu.edu/tsp/ To read the SEI technical report, Combining Architecture-Centric Engineering with the Team Software Process, please visitwww.sei.cmu.edu/library/abstracts/reports/10tr031.cfm

SEI . Blog .  Jul 27, 2015 02:57pm

Using TSP to Architect a New Trading System

By James McHale, Senior Member of the Technical Staff, Software Engineering Process Management This post is the second installment in a two-part series describing our recent engagement with Bursatec to create a reliable and fast new trading system for Groupo Bolsa Mexicana de Valores (BMV, the Mexican Stock Exchange). This project combined elements of the SEI’s Architecture Centric Engineering (ACE) method, which requires effective use of software architecture to guide system development, with its Team Software Process (TSP), which is a team-centric approach to developing software that enables organizations to better plan and measure their work and improve software development productivity to gain greater confidence in quality and cost estimates. The first post examined how ACE was applied within the context of TSP. This posting focuses on the development of the system architecture for Bursatec within the TSP framework. Challenges From a TSP perspective, the project faced several challenges. First, the few developers who had worked on the existing system had either moved into management or possessed technical skills that were out-of-date with modern development technologies. Second, the remaining developers, while competent, did not have experience in building the type of system that Bursatec needed. Another challenge was that several executives within the organization were in favor of outsourcing the work. Our Approach In the Bursatec project, we initially followed the standard TSP implementation approach, which emphasizes the importance of initially securing senior management commitment. This commitment is typically established via a TSP Executive Strategy Seminar, which covers the key practices and principles of TSP from a senior management perspective. Although Bursatec is a large organization, with several layers of checks and balances befitting a national stock market, the organization itself was very open, which allowed for streamlined communication between senior managers and the engineering team. In this open environment, the director at Bursatec, as well as his boss—who was president of the Mexican Stock Exchange—participated in the executive training. The executive training included an overview of rational management, the idea that management decisions should be made based on objective facts and data, and why this type of management is required to maintain successful TSP teams. We then trained the team leader of the project, as well as several other peers and senior developers at Bursatec in the basics of day-to-day management of TSP teams. We next trained the entire Bursatec development team—including the architects and team leader—in the fundamentals of the Personal Software Process (PSP), which teaches individual software engineers how to plan and manage high-quality software development work. The team would go on to apply the PSP concepts in a project, team-based environment. In an unusual development, the Bursatec director attended this class and also authored several programs using PSP methods, which he did as well as any of the developers. Having such a senior manager there—not just in the class but using the methods—sent the strong message that this was how "we" would be working going forward. After completing the PSP training, we conducted a Quality Attribute Workshop. This workshop is an architecture activity where Bursatec stakeholders defined the five most important quality attribute requirements (also known as quality attribute scenarios) that their new trading system had to fulfill. Not surprisingly, given the importance of speed for the new system, the stakeholders identified runtime performance as chief among the most important quality attribute scenarios. For the Bursatec developers, one benefit of defining quality attributes is that the practice placed significant emphasis on ensuring that the attributes be measurable. For example, the performance attribute was measured in two ways: the time for individual transactions (how fast each one was processed) and the throughput (how many transactions per second on an ongoing basis). In this context there is perfect harmony between what the ACE approach asks architects to do and what the TSP approach demands of developers. TSP teams receive fairly general direction for eliciting and capturing such quality attributes, the understanding of which often drives a project’s structure in addition to the structure of the developed product. With the Bursatec project, the ACE methods provided clear, specific direction on the early lifecycle issues that TSP normally leaves to local practice. Later in the project, TSP drove a disciplined implementation of the architecture that might otherwise have eluded developers. The TSP Launch Immediately after the conclusion of the Quality Attribute Workshop, we conducted the TSP launch, which is a series of nine meetings held during the course of four days in which the team reaches a common understanding of the work and the approach that it will take and produces a detailed plan to guide its work. The TSP launch includes producing the necessary planning artifacts (such as goals, roles, estimates, task plan, milestones, quality plan, and risk mitigation plan) that brought together a team of 14 members, including the team leader. Our goal was to plan the architecture activities in the context of supporting Bursatec and their existing time and budget constraints. During the launch, about half of the team focused on the architecture, including several people who were brought in as domain experts. These individuals were experts at interpreting the functional requirements and ensuring that the developers met them. For example, one individual had expertise in the Mexican Stock Exchange while another domain expert had extensive experience in the options and futures markets, specifically how those instruments are traded in Mexican markets. The other half of the team, seven developers, focused on two important needs for the system: high- speed communication and a testing framework. To successfully develop the system in the timeframe needed for Bursatec, it was critical that the system be tested automatically rather than manually. Testing (including regression testing to ensure that no other aspects of the system are compromised) of new functionality on the current system takes as much as a month and is performed manually. This testing motivates a quality attribute scenario for rapid testability of most new functions within a day, which leads naturally to an architecture that supports automated testing. The Bursatec developers then implemented the system’s underlying infrastructure based on an early version of the system architecture, while the architects elaborated their work based in part on the early developer work that supported a decision to purchase a particular commercial package for high-speed communication. This version of the architecture was subject to an Architecture Tradeoff Analysis Method (ATAM) review that ensured the quality attribute scenarios captured in the QAW were still the right ones, and that the proposed architecture addressed those scenarios. After the initial architecture iterations and the ATAM, the architects and other developers worked as a single, integrated team, removing the potential issues that sometimes arise when software architects throw their artifacts "over the wall" to developers. The architects dealt with issues and revised the architecture as necessary while shouldering a normal development workload. The team named role managers—a TSP concept—to focus on issues surrounding performance and garbage collection, two implementation issues critical to the success of the new trading system. Measurable Results While TSP can be used to manage all aspects of the software development phase, from requirements elicitation to implementation and testing, this is the first time that the approach has been applied to ACE technologies. The combination of these approaches offered Bursatec architects and developers a disciplined method for developing the software for their new trading engine. Through 6 major development cycles including 14 or so iterations over 21 months, the overall team developed over 200,000 lines of code, spending about 12 percent of their effort after the Quality Attribute Workshop on architecture and approximately 14.5 percent of effort in unit testing, performance testing, and integration testing. In contrast, the SEI would normally expect almost twice as much testing effort at this point in development, with potentially much more in system testing to push the overall total close to or beyond the 50-percent mark—an unfortunately realistic expectation in our industry. As of October 2011, system testing at Bursatec proceeds on schedule with a very low defect count (unusual in our experience), and the system is on target for deployment beginning in early 2012. Due to the early investment in architecture and a detailed, data-driven approach to managing both their schedule and their quality, less testing was required throughout system development. Another benefit of combining TSP with ACE is that the team of Bursatec developers was prepared for inevitable changes in the architecture requirements, indeed in changes of any sort over the 21 months of development. When the team received new requirements, it could evaluate them quickly for technical impact and implementation cost in terms of time and effort. With the quality attributes formally captured, the architecture in place, and detailed development plans at every step, a project with enormous risk potential in both technical and business terms ran on-time, within budget, and generally without the drama that large development efforts often exhibit. Additional Resources: To read the SEI technical report, Team Software Process (TSP) Body of Knowledge (BOK), please visit www.sei.cmu.edu/library/abstracts/reports/10tr020.cfm For more information about the SEI’s work in Architecture Centric Engineering (ACE), please visitwww.sei.cmu.edu/about/organization/rtss/ace.cfm For more information about the SEI’s work in the Team Software Process (TSP), please visitwww.sei.cmu.edu/tsp/ To read the SEI technical report, Combining Architecture-Centric Engineering with the Team Software Process, please visitwww.sei.cmu.edu/library/abstracts/reports/10tr031.cfm

SEI . Blog .  Jul 27, 2015 02:57pm

Measures for Managing Operational Resilience

By Julia Allen, Principal ResearcherCERT Program The SEI has devoted extensive time and effort to defining meaningful metrics and measures for software quality, software security, information security, and continuity of operations. The ability of organizations to measure and track the impact of changes—as well as changes in trends over time—are important tools to effectively manage operational resilience, which is the measure of an organization’s ability to perform its mission in the presence of operational stress and disruption. For any organization—whether Department of Defense (DoD), federal civilian agencies, or industry—the ability to protect and sustain essential assets and services is critical and can help ensure a return to normalcy when the disruption or stress is eliminated. This blog posting describes our research to help organizational leaders manage critical services in the presence of disruption by presenting objectives and strategic measures for operational resilience, as well as tools to help them select and define those measures. In April 2011, the DoD identified the engineering of resilient systems as a top strategic priority in helping to protect against the malicious compromise of weapons systems and to develop agile manufacturing for trusted and assured defense systems. SEI CERT has been exploring the topic of managing operational resilience at the organizational level for the past seven years through development and use of the CERT Resilience Management Model (CERT-RMM), a capability model designed to establish the convergence of operational risk and resilience management activities and apply a capability level scale that expresses increasing levels of process performance. CERT-RMM measures the ability of an organization to protect and sustain high-value services (which are organizational activities carried out in the performance of a duty or production of a product) and high-value assets (which are items of value to the organization, such as people, information, technology, and facilities that high-value services rely on). Resilient systems, as identified by the DoD, is one category of technology asset. Our research on resilience measurement and analysis focuses on addressing the following questions, which are often asked by organizational leaders: How resilient is my organization? Have our processes made us more resilient? What should be measured to determine if performance objectives for operational resilience are being achieved? To establish a basis for measuring operational resilience, we relied on the CERT-RMM as the process-based framework against which to measure. CERT-RMM comprises 26 process areas (such as Incident Management and Control (IMC) and Asset Definition and Management (ADM)) that provide a framework of goals and practices at four increasing levels of capability (Incomplete, Performed, Managed, and Defined.) Our initial work provided organizational leaders with tools to determine and express their desired level of operational resilience. Specifically, we defined high-level objectives for an operational resilience management program, for example, "in the face of realized risk, the program ensures the continuity of essential operations of high-value services and their associated assets." We then demonstrated how to derive meaningful measures from those objectives using a condensed Goal Question (Indicator) Metric method, for example, determining the probability of delivering service through a disruptive event. We also defined a template for defining resilience measures and presented example measures using the template. Too often, organizations collect "type count" measurements (such as numbers of incidents, systems with patches installed, or people trained) with little meaningful context on how these measures can help inform decisions and affect behavior. Based on the Goal Question (Indicator) Metric method outlined above, we identified strategic measures that help organizational leaders determine which process-level measures best address their needs. What follows is a description of five organizational objectives for managing operational resilience and 10 strategic measures for an operational resilience management (ORM) program. The ORM program defines an organization’s strategic resilience objectives (such as ensuring continuity of critical services in the presence of a disruptive event) and resilience activities (such as the development and testing of service continuity plans). We use an example of acquiring managed security services from an external provider to show how each measure could be used. Managed security services may include network boundary protection (such as firewalls and intrusion detection systems), security monitoring, incident management (such as forensic analysis and response), vulnerability assessment, penetration testing, and content monitoring and filtering. Organizational objective 1: The ORM program derives its authority from—and directly traces it to—organizational drivers (which are strategic business objectives and critical success factors), as indicated by the following measures: Measure 1: Percentage of resilience activities that do not directly (or indirectly) support one or more organizational drivers. Example use: External security services replace comparable in-house services with a lower cost (less effort) and more effective (less impact from incidents) solution. After external security services are operational, 75 percent of in-house efforts no longer support organizational drivers. This measure can be used to ensure an effective transition of designated in-house services to externally-provided services and to retrain/reassign staff currently performing such services. Measure 2: For each resilience activity, the number of organizational drivers that require it to be satisfied (the goal is equal to or greater than 1). Example use: An example of a resilience activity is formalizing a relationship with a security services provider using a contract or service level agreement (SLA) that includes all resilience specifications. There is at least one organizational driver that calls for having security services in place to achieve the driver. This driver likely maps to a personal objective of the chief information officer or chief security officer. If there is no such traceability, one or more drivers may require updating. Organizational objective 2: The ORM program satisfies resilience requirements that are assigned to high-value services and their associated assets, as indicated by the following measures: Measure 3: Percentage of high-value services that do not satisfy their assigned resilience requirements. Example use: Resilience requirements for security services are specified in the SLA. Provider performance is periodically reviewed to ensure that all services are meeting the SLA requirements (for example, high priority alerts from incident detection systems are resolved within xx minutes). Optimally, this percentage should be zero. If it is greater than an SLA-stated threshold (for example, 20 percent for service A), corrective action is taken and confirmed. Measure 4: Percentage of high-value assets that do not satisfy their assigned resilience requirements. Example use: This example is similar to the one above. The incident database is a high-value asset that is required to provide incident response services. The SLA specifies resilience requirements for this database, including daily automated backups and quarterly and event-driven (backup server upgrade and high-impact security incident) testing to ensure the provider’s ability to successfully restore from backups. Optimally, this percentage should be zero. If it is greater than an SLA-stated threshold (for example, 20 percent for asset B), corrective action is taken and confirmed. Organizational objective 3: The ORM program—via the internal control system—ensures that controls for protecting and maintaining high-value services and their associated assets operate as intended, as indicated by the following measures: Measure 5: Percentage of high-value services with controls that are ineffective or inadequate. Example use: The SLA identifies the controls (policies, procedures, standards, guidelines, tools, etc.) that are required by a service. These controls can be tailored versions of the controls that the organization uses or can be negotiated based on the provider’s standard suite of controls. Provider implementation of these controls is periodically reviewed (audited, assessed, scans performed, etc.). Optimally, this percentage should be zero. If it is greater than an SLA-stated threshold (for example, 20 percent for service A), corrective action is taken and confirmed. Measure 6: Percentage of high-value assets with controls that are ineffective or inadequate. Example use: This measure is as described above, with asset controls stated in the provider SLA. Organizational objective 4: The ORM program manages operational risks to high-value assets that could adversely affect the operation and delivery of high-value services, as indicated by the following measures: Measure 7: Confidence factor that risks from all sources that require identification have been identified. Example use: Major sources of risk are initially identified in the provider SLA and as part of an ongoing review based on changes in the operational environment within which services are provided. The elements that contribute to "confidence factor" (such as risk thresholds by service) are also identified. Confidence factor is represented as a Kiviat diagram showing plan versus actual for all sources. Analysis of provider gaps is reviewed on a periodic basis and corrective action is taken and confirmed to reduce unacceptable gaps. Measure 8: Percentage of risks with impact above threshold. Example use: Assessment of provider risk is performed on a periodic basis as specified in the SLA. Optimally, this percentage should be zero. If it is greater than an SLA-stated threshold (for example, 20 percent for risk type A), corrective action is taken and confirmed. Organizational objective 5: The ORM program ensures the continuity of essential operations of high-value services and their associated assets in the face of realized risk, as indicated by the following measures: Measure 9: Probability of delivered service through a disruptive event. Example use: The SLA states service-specific availability and service levels to meet, both steady state and in degraded mode. Provider performance is periodically reviewed, including during and after a disruptive event (power outage, cyber attack, etc.). Probability of delivered service is determined and evaluated as a trend over time. Corrective action is taken and confirmed as required. Measure 10: For disrupted, high-value services with a service continuity plan, percentage of services that did not deliver service as intended throughout the disruptive event. Example use: The SLA includes requirements for service-specific continuity (SC) plans. For provider services with SC plans that do not maintain required service availability and service levels, corrective actions are taken and confirmed, including updates to SC plans. In addition, the customer uses this as an opportunity to review and update its own SC plans that depend on provider services, where service was not delivered as intended. All these strategic measures derive from lower-level measures at the CERT-RMM process area level, including average incident cost by root cause type and number of breaches of confidentially and privacy of customer information assets resulting from violations of provider access control policies. To help organizational leaders determine what measures work best for their organization, we are collaborating with members of the CERT-RMM Users Group, which includes the United States Postal Inspection Service, Discover Financial Services, Lockheed Martin, and Carnegie Mellon University. Through a series of two-day workshops, members define an improvement objective, assess their current level of operational resilience against that objective, identify areas of improvement, and implement improvement plans using the CERT-RMM processes and candidate measures as the guide. Please contact us if you are interested in joining a CERT-RMM Users Group. Additional Resources: To read the SEI technical note, Measuring Operational Resilience Using the CERT Resilience Management Model, please visit www.sei.cmu.edu/reports/10tn030.pdf To read the SEI technical note, Measures for Managing Operational Resilience, please visit www.sei.cmu.edu/library/abstracts/reports/11tr019.cfm For more information about the CERT Resilience Management Model (CERT-RMM), please visitwww.cert.org/resilience/rmm.html To read an article about how the CERT Resilience Management Model helps companies predict performance under stress, please visit page 8 of the SEI 25th Anniversary Year in Review,www.sei.cmu.edu/library/assets/annualreports/2010_Year_in_Review.pdf To read an article about CERT work in Resilience Measurement, please visit page 4 of the SEI 25th Anniversary Year in Review,www.sei.cmu.edu/library/assets/annualreports/2010_Year_in_Review.pdf

SEI . Blog .  Jul 27, 2015 02:57pm

Fuzzy Hashing Against Different Types of Malware

By David French, CERT Senior Researcher Malware, which is short for "malicious software," is a growing problem for government and commercial organizations since it disrupts or denies important operations, gathers private information without consent, gains unauthorized access to system resources, and other inappropriate behaviors. A previous blog post described the use of "fuzzy hashing" to determine whether two files suspected of being malware are similar, which helps analysts potentially save time by identifying opportunities to leverage previous analysis of malware when confronted with a new attack. This posting continues our coverage of fuzzy hashing by discussing types of malware against which similarity measures of any kind (including fuzzy hashing) may be applied. Fuzzy hashes provide a continuous stream of hash values for a rolling window over the malware binary, thereby allowing analysts to assign a percentage score that indicates the degree of similarity between two malware programs. When considering how fuzzy hashing works against malware, it is useful first to consider why malware programs would ever be similar to each other. For the purposes of this discussion we focus on prevalent Microsoft Portable Executable (PE) formatted files, although this description can be generalized to any executable code stored in any format. We further consider similarity as a measure of file structure—rather than program behavior—since fuzzy hashing generally applies to the bytes comprising a file, rather than an observation of the semantics of a program in some other space. Malware is software combining three elements: (1) code, whether compiled source code written in a high-level language or hand-crafted assembly, (2) data, which is some set of numerical, textual, or other types of discrete values intended to drive the logic of the code in specific ways, and (3) process, which is loosely a set of operations (for example, compiling and linking) applied to the code and data that ultimately produce an executable sequence of bytes in a particular format, subject to specific operating constraints. Given a distinct set of code, data, and consistent processes applied thereto, it is reasonable to conclude that—barring changes to any of these—we will produce an identical executable file every time we apply the process to the code and data (where identity is measured using a cryptographic hash, such as MD5). We now consider how the permutation of any of these components will affect the resulting executable file. First, let us consider the effect of modifying the data used to drive a particular executable. With respect to malicious software, such data may include remote access information (such as IP addresses, hostnames, usernames and passwords, commands, etc.), installation and configuration information (such as registry keys, temporary filenames, mutexes, etc.), or any other values which cause the malware to execute in specific ways. Generally speaking, changing the values of these data may cause different behavior in the malware at runtime but should have little impact on the structure of the malware. Malware authors may modify their source code to use different data values for each new program instance or may construct their program to access these data values outside the context of the compiled program (for example, by embedding the data within or at the end of the PE file). In the case of malicious code, data may also include bytes whose presence does not alter the behavior of the code in any way, and whose purpose is to confuse analysis. Regardless, the expected changes to the resulting executable file are directly proportional to the amount of data changed. Since we only changed the values of data—not the way in they are referenced (in particular, we have not changed the code)—we can expect that the structure of the output file is modified only to support any different storage requirements for the new data. Similarly, let us consider the effect of modifying the code found in a particular executable. The code defines the essential logic of the malware and describes the behavior of the program under specified conditions. To modify program behavior, the code must generally be modified. The expected changes to the resulting executable file are proportional to the amount of code changed, much as we expect when changing data. However, code—especially compiled code—differs from data in that the representation of the code in its final form is often drastically different from its original form. Compiling and linking source code represents a semantic transformation, with the resulting product intended for consumption by a processor, not a human reader. To accomplish semantic transformation most effectively, the compiler and linker may perform all manner of permutations, such as rewriting blocks of code to execute more efficiently, reordering code in memory to take up less space, and even removing code that is not referenced within the original source. If we assume that the process to create the executable remains constant (for example, that optimization settings are not changed between compilations), we must still allow that minor changes in the original source code may have unpredictably large changes in the resulting executable. As a consequence, code changes are more likely to produce executables with larger structural differences between revisions than executables where only data changes. Thus, we have described two general cases in which structurally different files (measured by cryptographic hashing, such as MD5) may be produced from a common source. We refer to malware families whose primary or sole permutation is in their data as generative malware, and use the analogy of a malware factory cranking out different MD5s by modifying data bytes in some way. We refer to malware families whose primary permutation is in their code as evolutionary malware, in that the behavior of the program evolves over time. When considering the effects of similarity measurements such as fuzzy hashing, we may expect that fuzzy hashing will perform differently against these different general types of malware. As an example of using fuzzy hashing against generative malware, consider the malware family BackDoor-DUG.a (also referenced here) also known as Trojan.Scraze by ClamAV and W32/ScreenBlaze.A2 by F-Prot (ClamAV and F-Prot are antivirus vendors and it’s important to note that the same family is known by several different names). The two files referenced from the McAfee site are Delphi programs, comprising 4,185 functions at distinct addresses as observed by disassembling each program using IDA-Pro v6.1. If we consider each function as a sequence of bytes and consider the cryptographic hashes of each function’s bytes using a technique called function hashing, when we observe that these programs have approximately 3,321 unique functions each, per their position independent code (PIC) function hashes. Of these 3,321 functions distinct to each program, we observe that 3,292 of these functions are shared (meaning their bytes are exactly the same) between the programs, and that each program has 29 functions not shared with the other program. Inspecting each of the 29 functions in each of two files (for a total of 58 functions) in IDA-Pro, we discover that for all 29 pairs of functions found at the same address across the two files, the functions at the same addresses only differ by large blocks of seemingly non-executed data, which is jumped around by the code bytes. Otherwise, code bytes for each of the 29 function pairs at corresponding addresses are identical. In this way, we can observe that the two programs are materially identical, except for seemingly non-executed bytes, which we generically call data. By performing ssdeep comparison of these two files we produce the following fuzzy hashes and their associated comparison score: 12288:gp/iN/mlVdtvrYeyZJf7kPK+iqBZn+D73iKHeGspOdqcXigCcCmua1xIam:gpQ/6trYlvYPK+lqD73TeGspOQKUmxpm,"70212f8f88865f4f9bb919383aabc029.ex_" 12288:gp/iN/mlVdtvrYeyZJf7kPK+iqBZn+D73iKHeGsptx6KrPSTKQGLG4a4:gpQ/6trYlvYPK+lqD73TeGspqnKx64,"6f83ac65223e2ac7837bfe3068da411c.ex_"70212f8f88865f4f9bb919383aabc029.ex_ matches 6f83ac65223e2ac7837bfe3068da411c.ex_ (85) Matching these files using ssdeep corroborates our findings using analysis of these files by function data, in that they are highly similar. These two files thus provide a good example of generative malware. When considering how code changes can affect fuzzy hashing, we consider briefly non-malicious software for which we have full source code. The Nullsoft Scriptable Installation System (NSIS) is an open-source installation system used to create Windows-based installation programs. Although NSIS is not malicious software it can be used to install many different types of programs on Windows computers, including malicious and non-malicious programs alike. The project page for NSIS provides several revisions; we examined the two most recent Versions 2.45 (MD5 sum af193ccc547ca83a63eedf6a2d9d644d) and 2.46 (MD5 sum 0e5d08a1daa8d7a49c12ca7d14178830), for which Windows binaries are available. The two files comprise 6,038 and 6,040 functions at distinct addresses, respectively, with 2,564 unique functions (as measured by their PIC function hashes). These two programs have 2,544 identical functions, with 20 different functions each. The differing functions have changes that range from identical functions using different constants to entirely new functions with no overlapping behavior. Regardless, the vast majority of the behavior of these two programs is identical. We perform ssdeep comparison of these two files, and produce the following fuzzy hashes, and their associated comparison score: 12288:p24n/P3WRlauwYyPd7K67jBOs/skXMujtiEs6vHG9Uu94yGjbgWsvvs0V:k4n3GRMuwYyV26XDRiE6qu+yJWsXsa,"nsis-2.45/NSIS.exe" 12288:lWe4uCFAtIma4w3PE6EPYL/t+32gNjw6ps6cg1eHgfKkx71DS0V:Ie4ugwIma4O86YnE6pxKgCg71Sa,"nsis-2.46/NSIS.exe" nsis-2.45/NSIS.exe matches nsis-2.46/NSIS.exe (0) As seen from the score of zero from ssdeep, fuzzy hashing does not detect any relationship between these two files even though function analysis revealed that the majority of the behavior of these two files is the same. This result is borne out by reading the release notes for V2.46 from the NSIS website, which documents relatively minor changes. When we compare two files whose known changes are relatively few, we can see that, although the evolution of these two programs is relatively minor in terms of absolute number of changes to functionality, their structure is clearly different enough that fuzzy hashing such as ssdeep was completely unable to detect similarity. This highlights the challenging problem of similarity measurements in malicious code, and underscores the need to understand the underlying reasons that similarity would ever present to any particular technique. Future blog entries will consider alternate fuzzy hashing approaches and tools, and discuss some of the challenges of performing fuzzy hashing at scale. This post is the second in a series exploring David's research in fuzzy hashing. To read the first post in the series, Fuzzy Hashing Techniques in Applied Malware Analysis, please click here. Additional Resources: More information about CERT research in malicious code and development is available in the 2010 CERT Research Report, which may be viewed online at www.cert.org/research/2010research-report.pdf

SEI . Blog .  Jul 27, 2015 02:56pm

Toward Safe Optimization of Cyber-Physical Systems

By Dionisio de NizSenior Member of the Technical Staff, Research, Technology, and System Solutions Cyber-physical systems (CPS) are characterized by close interactions between software components and physical processes. These interactions can have life-threatening consequences when they include safety-critical functions that are not performed according to their time-sensitive requirements. For example, an airbag must fully inflate within 20 milliseconds (its deadline) of an accident to prevent the driver from hitting the steering wheel with potentially fatal consequences. Unfortunately, the competition of safety-critical requirements with other demands to reduce the cost, power consumption, and device size also create problems, such as automotive recalls, new aircraft delivery delays, and plane accidents. Our research leverages the fact that failing to meet deadlines doesn’t always have the same level of criticality for all functions. For instance, if a music player fails to meet its deadlines the sound quality may be compromised, but lives are not threatened. Systems whose functions have different criticalities are known as mixed criticality systems. This blog posting updates our earlier post to describe the latest results of our research on supporting mixed-criticality operations by giving more central processing unit (CPU) time to functions with higher value while ensuring critical timing guarantees. During our research, we observed that different functions provide different amounts of utility or satisfaction to the user. For instance, a GPS navigation function may provide higher utility than a music player. Moreover, if we give more resources to these functions (for example, more CPU time) the utility obtained from them increases. In general, however, the amount of utility obtained from additional resources does not grow forever, nor does it grow at a constant rate. The additional increment in utility for each additional unit of resource instead decreases to a point where the next increment in utility is insignificant. In such cases, it is often more important to dedicate additional computational resources to another function that is currently delivering lower utility and will deliver a larger increment in utility for the same amount of CPU time. For example, assuming that we get a faster route to our destination if more CPU time is dedicated to the GPS functionality, it seems obvious that the first route we get from the GPS will give us the biggest increment in utility. If we lack enough CPU time (due to the execution of other critical functions) to run both the GPS and the music player, we will choose the GPS. We may even prefer to give more CPU time (if we discover that more time is available) to the GPS to help avoid traffic jams before we decide to run the music player. Letting the GPS run even longer to select a less traffic-clogged route, however, may give us less utility than running the music player. At this point, we may prefer to start running the music player if we have more CPU time available. We thus change our allocation preference because the additional utility obtained by giving the GPS more CPU time is less than the utility obtained by giving the music player this time. This progressive decrease in the utility obtained as we give more resources to a function is known as diminishing returns, which can be used to allocate resources to ensure we obtain the maximum total utility possible considering all functions in the system. Our research uses both the diminishing returns characteristics of low-criticality functions and criticality levels to implement a double-booking computation time reservation scheme. Traditional real-time scheduling techniques consider the worst-case execution time (WCET) of the functions to ensure they always complete before their deadlines by reserving CPU time used only in the rare occasion that the WCET occurs. We take advantage of this fact and allocate the same CPU time for functions of lower-criticality. When both functions request the CPU time reserved for both at the same time, we favor the higher-criticality function and let the lower-criticality miss its deadline. Our double-booking scheme is analogous to the strategies airlines use to assign the same seat to more than one person. In this case, the seat is given to the person with preferred status (e.g., "gold members"). Our project uses utility—in addition to criticality—to ensure the CPU time that is double booked is given to functions providing the largest utility in case of a conflict (both functions requesting the double-booked CPU time). Our double-booking scheme provides the following two benefits: It protects critical functions ensuring that their deadlines are always met and It uses the unused time from the critical functions to run the non-critical functions that produce the highest utility. Our research is aimed at providing real-time system developers with an analysis algorithm that accurately predicts system behavior when it is running (runtime). Developers use these algorithms during the design phase (design-time) to test whether critical tasks will meet their deadlines (providing assurance), and how much overbooking is possible. To evaluate the effectiveness of our scheme, we developed a utility degradation resilience (UDR) metric that quantifies the capacity of a CPS to preserve the utility derived from double-booking. This metric evaluates all possible conflicts that can happen due to double booking and how much total utility is preserved after the conflict is resolved by deciding what function gets the double-booked CPU time and what functions are left without CPU time. The utility derived from the preserved functions is then summed to compute the total utility that a specific conflict resolution scheme can preserve. In theory, a perfect conflict resolute scheme should preserve the maximum possible utility. In reality, however, decisions must be made ahead of time assuming that some critical functions will run for their worst-case execution time (even though they may not) to ensure that they finish before their deadlines. Unfortunately, if they execute for less time, it may already be too late to execute other functions. Using the UDR metric we compare our scheme against the Rate-Monotonic Scheduler (RMS) and a scheme called Criticality-As Priority Assignment (CAPA) that uses the criticality as the priority. Our experiments showed we can recover up to 88 percent of the ideal utility that we could get if we could fully reclaim the unused time left by the critical functions if we had perfect knowledge of exactly how much time each function needed to finish executing. In addition, we observed our double-booking scheme can achieve up to three times the UDR that RMS provides. We implemented a design-time algorithm to evaluate the UDR of a system and generate the scheduling parameters for our runtime scheduler that performs the conflict resolutions of our overbooking scheme (deciding which function gets the overbooked CPU time). This scheduler was implemented in the Linux operating system as a proof-of-concept to evaluate the practicality of our mechanisms. To evaluate our scheme in a real-world setting, we used our scheduler in a surveillance UAV application using the Parrot A.R. Drone quadricopter with safety-critical functions (flight control) and two non-critical functions (a video streaming and a vision-based object detection functions). Our results confirmed that we can recover more CPU cycles for non-critical tasks with our scheduler than with the fixed-priority scheduler (using rate-monotonic priorities) without causing problems to the critical tasks. For example, we avoided instability in the flight controller that can lead to the quadricopter turning upside down. In addition, the overbooking between the non-critical tasks performed by our algorithm, allowed us to adapt automatically to peaks in the number of objects to detect (and hence execution time of the object detection function) by reducing the frames per second processed by the video streaming function during these peaks. In future work we are extending our investigation to multi-core scheduling where we plan to apply our scheme to hardware resources (such as caches) shared across cores. This research is done in collaboration with Jeffrey Hansen of CMU, John Lehoczky of CMU’s Statistics Department; and Ragunathan (Raj) Rajkumar and Anthony Rowe of the Electrical and Computer Engineering Department at CMU. Additional Resources: www.contrib.andrew.cmu.edu/~dionisio/

SEI . Blog .  Jul 27, 2015 02:55pm

Bridging the “Valley of Disappointment” for DoD Software Research with SPRUCE

By Douglas C. Schmidt Chief Technology OfficerSEI As noted in the National Research Council’s report Critical Code: Software Producibility for Defense, mission-critical Department of Defense (DoD) systems increasingly rely on software for their key capabilities. Ironically, it is increasingly hard to motivate investment in long-term software research for the DoD. This lack of investment stems, in part, from the difficulty that acquisitions programs have making a compelling case for the return on these investments in software research. This post explores how the SEI is using the Systems and Software Producibility Collaboration and Experimentation Environment (SPRUCE) to help address this problem. Decades of public and private research investments—coupled with the inexorable growth of globalization and connectivity—have commoditized many information technology (IT) products and services. For example, commercial off-the-shelf (COTS) hardware and software is now produced faster, cheaper, and generally at a predictable pace. During the past two decades, users and developers of IT systems have benefitted from the commoditization of hardware and networking elements. More recently, the maturation and widespread adoption of object-oriented programming languages, operating environments, and middleware is helping commoditize many software components and end-system layers. Due to this IT commoditization trend, acquisition professionals, senior leaders, politicians, and funding agencies often assume new software innovations will continue to appear at a predictable pace, and that the DoD can benefit from these innovations without significant investment in software research. While mainstream IT systems may not need this investment, mission-critical DoD systems—particularly at the tactical edge—cannot. Without sustained investment in software research, therefore, the DoD is in danger of "eating the seed corn" and reaching a complexity cap that will make it harder to succeed in an era of budget cuts and other austerity measures. Challenges to Effective Software Research Impact One challenge to motivating investment in software research is presenting a convincing pathway for how sponsored research finds its way into practice. The underlying problem for the DoD is the ad hoc and often serendipitous nature by which members of the software community (including academic researchers, defense contractor software architects and developers, DoD acquisition program and research sponsors, as well as commercial tool vendors) collaborate to identify, develop, test, and transition promising software technologies. This lack of systematic collaboration by various groups in the software community outlined above has created a dysfunctional— yet all-too-common—situation whereby DoD programs cannot find software technologies that meet their needs, regardless of their inherent promise. As a result, across the DoD acquisition programs repeatedly encounter problems developing, validating, and sustaining software. To exacerbate the problem, the "landing path" for software technologies is typically not DoD program engineers, but organizations (such as commercial vendors or standards bodies) responsible for maintaining the technology. These organizations are often not structured or motivated to leverage the results of advanced research projects effectively. For example, DoD software researchers have historically received funding for research programs of approximately three years in duration. These programs involve creating a project plan, building teams, working on technologies, generating and evaluating prototypes, and writing papers to publicize the work. Throughout this period, there is typically great enthusiasm for the project from the technical community. Once the program ends, however, the community often disbands and the project descends into the "valley of disappointment," a phenomenon in which researchers struggle to transition their prototypes to the DoD acquisition community, while the practitioners are equally frustrated with not being able to apply research results to practical problems. Getting stuck in the "valley of disappointment" is a common problem in technology research and development projects, as evidenced by Geoffrey Moore’s book Crossing the Chasm. Moore presents this problem from a venture capital perspective: a group of researchers develops a technology and identifies some early adopters, but struggles to transition from the early-adoption to majority-adoption phase. Some reasons for this valley are that researchers are often required to work on abstracted problems because that’s all that they can access and acquisition professionals don’t have the luxury of transitioning "science projects." Crossing the "Valley of Disappointment" with SPRUCE To address the challenges describe above, the Assistant Secretary of Defense Research & Engineering Enterprise (ASDR&E), through the Air Force Research Lab (AFRL), funded researchers at Lockheed Martin Advanced Technology Laboratories, in partnership with Booz Allen Hamilton, Vanderbilt University (where I worked on SPRUCE before joining the SEI), Drexel University, Virginia Tech University, Lockheed Martin Aeronautics, and Raytheon to create the Systems and Software Producibility Collaboration and Experimentation Environment (SPRUCE). SPRUCE is a collaborative set of web-based services that matches DoD challenge problems with the methods, algorithms, tools and techniques developed by researchers. One way to think about SPRUCE is as an "eHarmony" portal for researchers that unites domain experts from the DoD acquisition community who face concrete technical challenges with software researchers who can solve them. For example, acquisitions professionals could be searching for an approach that will allow them to run legacy code on a multi-core platform or an algorithm that minimizes the amount of processors and network bandwidth in an avionics system. SPRUCE refers to these people as the "problem providers," who post challenge problems into the SPRUCE portal. Conversely, researchers are "solution providers" who use SPRUCE to post candidate solutions to available challenge problems. SPRUCE allows problem providers to explain their needs in a structured way—along with representative data sets and reproducible experiments—so that solution providers from software researchers can decide if they have methods or technologies that would make an impact on the posted problem. If researchers operate in an open environment (which is typical at universities), they can post their solution on the SPRUCE portal. If researchers operate in a closed environment (which is typical at companies), they can contact the problem providers directly and discuss options for collaboration. SPRUCE addresses many problems facing DoD software researchers: It allows researchers access to real-world problems and realistic data sets. Even if the problem providers have anonymized their problem (for example, by removing proprietary information), it still represents an actual challenge faced by the DoD. Once researchers demonstrate that their solutions work on abstracted problems that are relevant to a particular domain and derived from real-world scenarios, it is easier to convince the original problem providers that the results are ready to be applied in practice. SPRUCE facilitates healthy competition among research groups. For example, SEI researchers may believe they have the most effective techniques and tools for detecting similarity in malware, but SPRUCE allows them to compare their results against techniques and tools devised by other researchers for a common data set. In addition to the obvious competitive benefits, this approach also allows a better collaborative evolution of the solution by incorporating the best parts from each approach into a refined approach. SRPUCE helps researchers locate sources of funding because it provides an immediate way for them to showcase their results in a forum that has an audience (the challenge problem providers) interested in solutions to real problems. Over time, researchers will populate the SPRUCE repository with their solutions, providing a way for them to find audiences for new funding and additional collaborations on real problems. In its four years of funding, SPRUCE has focused primarily on capturing DoD challenge problems and helping DoD software researchers collaborate more effectively with other members of the DoD software community. SPRUCE has also influenced the National Science Foundation (NSF), which recently created the Cyber Physical Systems Virtual Organization (CPS-VO) community as a web portal for problem providers in cyber-physical systems. NSF-funded researchers use CPS-VO to post challenges and to work collaboratively to solve challenge problems with their colleagues around the world. SPRUCE represents part of the trend towards more collaborative research and development among scientists and engineers. For example, UAVForge.net is attempting to use crowd sourcing to go from concept to fly-off of air vehicle designs in under six months. Likewise, a recent article from Wall Street Journal titled The New Einsteins Will Be Scientists Who Share proclaims "publicly funded science should be open science." Portals like SPRUCE help move researchers and practitioners from isolated pockets of collaboration to mainstream adoption. Using SPRUCE to Guide SEI Research At the SEI, we are using SPRUCE to showcase our solutions and, more importantly, to capture real-world challenge problems from our stakeholders. For example, the SEI hosted a workshop in August 2011 that brought together researchers and problem providers from Lockheed Martin, Boeing, AFRL, Carnegie Mellon University and Virginia Tech to elicit guidance for our work in real-time scheduling and currency analysis for cyber-physical systems. The workshop participants provided the SEI with challenge problems from avionics domain experts to ensure research we are doing addresses real DoD problems. As a result of this workshop, the problem providers populated the SPRUCE database with problems that SEI technologists will use to guide our future work. We are in the process of conducting challenge problem workshops for other software research projects at the SEI to ensure we continue to work on relevant problems that have high impact on DoD operational needs. SPRUCE also allows us to continually improve our metrics and measures of success. As a federally funded research and development center, the SEI is often requested to substantiate data and success criteria. Problem providers—who by their nature have a close connection to real-world problems—help define the success criteria. This approach allows an external party, like a DoD contractor, to define the success criteria for SEI researchers who then work to achieve those criteria. At the same time, it showcases the solutions that SEI technologists have developed for various technologies, such as multi-core platforms. In a commoditized IT environment, human resources are an increasingly strategic asset. In the future, therefore, premium value and competitive advantage will accrue to individuals, universities, companies, and agencies that continue to invest in software research and who master the principles, patterns, and protocols necessary to collaboratively integrate commoditized hardware and software to develop complex systems that cannot yet be bought off-the-shelf. Success in this endeavor requires close collaboration between academia, industry, and government. The SPRUCE portal described above helps to facilitate this collaboration by bringing key stakeholders to the table and ensuring that government investments in software research have greater impact on DoD acquisition programs. Additional Resources: For more information about the SPRUCE portal, please visit www.sprucecommunity.org/default.aspx. To read about the need to motivate greater DoD investment in software research, please see the National Research Council’s Critical Code: Software Producibility for Defense report available atwww.nap.edu/openbook.php?record_id=12979&page=R1.

SEI . Blog .  Jul 27, 2015 02:55pm

Cloud Computing at the Tactical Edge

By Grace Lewis, Senior Member of the Technical StaffResearch Technology & System Solutions Cloudlets, which are lightweight servers running one or more virtual machines (VMs), allow soldiers in the field to offload resource-consumptive and battery-draining computations from their handheld devices to nearby cloudlets. This architecture decreases latency by using a single-hop network and potentially lowers battery consumption by using WiFi instead of broadband wireless. This posting extends our original post by describing how we are using cloudlets to help soldiers perform various mission capabilities more effectively, including facial, speech, and imaging recognition, as well as decision making and mission planning. An initial goal of our research was to create a prototype application that located cloudlets within close proximity of handheld devices using them. We initially focused on offloading computations to cloudlets to extend device battery life. In addition to this benefit, we also found cloudlets significantly reduce the amount of time needed to deploy applications to handheld devices because clients are not tied to a specific server that can take a long time to provision in tactical environments. Our work together with Mahadev "Satya" Satyanarayanan (the creator of the cloudlet concept and a faculty member at Carnegie Mellon's School of Computer Science) originally focused on face recognition applications as an example of a computation-intensive mission capability. Thus far we have created an Android-based facial recognition application that locates a cloudlet via a discovery protocol, sends the application overlay to the cloudlet, where dynamic VM synthesis is performed, captures the images and sends them to the facial recognition server code that now resides in the cloudlet. In the context of cloudlets, the application overlay corresponds to the computation-intensive code invoked by the client, which in this case is the face recognition server written in C++ and processes images from a handheld device client for training or recognition purposes. On execution, the overlay is sent to the cloudlet and applied to one of the VMs running in the cloudlet, which is called dynamic VM synthesis. The application overlay is pre-generated by calculating the difference between a base VM and the base VM with the computation-intensive code installed. The first version of the cloudlet we created is a simple HTTP server. When this server receives the application overlay from the client it decrypts and decompresses the overlay and performs VM synthesis to configure the cloudlet dynamically. It subsequently returns coordinates for the faces it recognizes, as along with a measure of confidence to the client device. Constructing the Cloudlet Prototype The original cloudlet prototype built by Satya’s team used a simple Virtual Network Computer (VNC) client, to see what was executing inside the VM. Our cloudlet prototype extended Satya’s work to use a thick mobile client that provides a better user experience for users at the edge and allows incorporation of sensor information that would not be possible with the original VNC cloudlet approach. We constructed this prototype in the RTSS Concept lab. Our design was tricky because the face recognition client needs to know the IP address and the port on which the face recognition server is listening so that it can connect to it. The client uses an HTTP request to start the cloudlet setup expects an HTTP response from the cloudlet server that includes the face recognition server IP address and port. Since the IP address is assigned by the DHCP server because the VM is executing in bridged mode, however, the host server has no visibility into that assignment, so there was no simple way to obtain the IP address and port. To solve this problem, we included a Windows service in the VM and run on startup. The Windows service invokes a Python script that performs the following three tasks: start the face recognition server executable in a separate thread inside a Python script, read the face recognition server configuration file that contains the IP address and port that the face recognition server is listening on, and write this information to a file that is accessible by the cloudlet Although the Windows service creates additional complexity on the cloudlet server, it reduces the complexity cloudlet setup in the field. During field operation, servers residing within Tactical Operation Center (TOCs) and Humvees are provisioned with a set of pre-packaged cloudlets to support a range of applications and versions to avoid provisioning servers for each supported application platform and version. The handheld devices of soldiers participating in the mission are then loaded with application overlays that are necessary for a particular mission. A soldier running a computation-expensive application can discover a compatible cloudlet within minutes and offload the expensive computation to the cloudlet running on a server. What We’ve Learned Our research has identified the following two types of applications that can be deployed in a cloudlet setting: Data-source-reliant applications that rely on a particular data source to work. For example, if soldiers need to launch the facial recognition application, they need a database of faces to match images against. Another example would be if the soldier wanted to compare fingerprints and needed a database of fingerprints to match against. In this setting the cloudlet must be configured to connect the cloudlet to a particular data source. Non-data-source-reliant applications that are computationally intensive but don’t require a large data source to work. For example, imagine soldiers encountering a sign with characters they don’t understand. They can take a picture of the sign and submit it to a cloudlet to determine the language in which the sign is written. In this case the computationally-intensive code residing on the cloudlet relies on complex character recognition algorithms instead of a large database. As expected, our experiments demonstrated that the size of the overlay increases overlay transmission time (which in turn consumes more battery) as well as VM synthesis time. If the data source is included inside the overlay this would create a large overlay, which indicates that the cloudlet concept is better fit for non-data-source-reliant applications. We overcame this problem by specifying the location of the data source in a configuration file. The location could be the local server or a server accessible over a network or the Internet. Although this approach requires additional configuration, it is only done once (when the cloudlet is packaged by IT experts), rather than doing it each time a server is configured in the field (potentially by non-IT experts). Future Work When testing the cloudlet prototype in the RTSS Concept Lab, we discovered that a reduced deployment time makes it easier to deploy an application in a tactical environment. We are working to capture those measurements and are developing the following applications to support our findings fingerprint recognition — fingerprints are captured using a fingerprint scanner connected to a handheld device and sent to the cloudlet for processing, character recognition — pictures of a written sign are taken with a camera on the handheld device and sent to the cloudlet for character identification and translation, speech recognition — voice of a person speaking a foreign language is captured using the voice recorder on the handheld device and sent to the cloudlet for translation; the same application can be used to translate a response back to the identified foreign language, and model checking —An app is generated on the handheld on-the-fly using end-user programming capabilities and sent to a model checker in a cloudlet to ensure it does not violate any security (or other) policies and constraints. We will use these new applications to gather measurements related to bandwidth consumption of overlay transfer and VM synthesis to focus on optimization of cloudlet setup time. Our future research and collaboration will position cloudlets to both reduce battery consumption and simplify application deployment in the field. For example, our goal is to use dynamic VM synthesis to slash the time needed to deploy applications, thereby shielding operators from unnecessary technical details, while also communicating and responding to mission-critical information at an accelerated operational tempo. Additional Resources: This is the second post in a series exploring the SEI’s research in Cloud Computing in partnership with Satya. To read the initial post, Cloud Computing for the Battlefield, please visithttp://blog.sei.cmu.edu/post.cfm/cloud-computing-for-the-battlefield.

SEI . Blog .  Jul 27, 2015 02:54pm

Equipping the Soldier with End-User Programming

By Edwin Morris, Advanced Mobile Systems Initiative LeadResearch, Technology & System Solutions Whether soldiers are on the battlefield or providing humanitarian relief effort, they need to capture and process a wide range of text, image, and map-based information. To support soldiers in this effort, the Department of Defense (DoD) is beginning to equip soldiers with smartphones to allow them to manage that vast array and amount of information they encounter while in the field. Whether the information gets correctly conveyed up the chain of command depends, in part, on the soldier’s ability to capture accurate data while in the field. This blog posting, a follow-up to our initial post, describes our work on creating a software application for smartphones that allows soldier end-users to program their smartphones to provide an interface tailored to the information they need for a specific mission. The software we developed is constructed primarily in Java and operates on an Android platform. We used an object database (DB 4.0) as the underlying data store because it provides flexible and powerful application programming interfaces (APIs) that simplified our implementation. For performance reasons, our application is a native Android app - it’s not running on a browser of an Android smart phone. Our app—called eMONTAGE (Edge Mission Oriented Tactical App Generator)—allows a soldier to build customized interfaces that support the two basic paradigms that are common to smartphones: maps and lists. For example, a soldier could build an interface that allows them to construct a list of friendly community members including names, affiliations with specific groups, information about whether the person speaks English, and the names of the person’s children. If the soldier also specifies a GPS location in the customized interface s/he constructs, the location of the friendly community members could be plotted on a map. Likewise, the same soldier could build other customized interfaces that capture specific aspects of a threatening incident, or the names and capabilities of non-governmental organizations responding to a humanitarian crisis. Challenges We Encountered The software we built is intended for soldiers who are well-versed in their craft, but are not programmers. While we are still conducting user testing, after we developed a prototype, we asked several soldiers to provide feedback. Not surprisingly, we found that soldiers who are Android users and relatively young (i.e., digital natives) quickly learned the software programming application and could use it to build a new application on-site. Conversely, non-digital natives had a harder time. Since our goal is to make our software accessible to every soldier, we are simplifying, revising, and improving the user interface. As with any device used by our military, security is a key concern. Through our work with DARPA’s Transformative Apps program in the Information Innovation office, we can take advantage of the security strategies they conceive and implement. We are also working to address challenges associated with limited bandwidth and battery consumption in this work and other work within the Research, Technology, and Systems Solutions program at the SEI. Another area of our work involves enabling our software to connect to back-end data sources that the DoD uses. For example, a soldier on patrol may need to connect to TiGR and other information systems to access current information about people, places, and activities in an area. Our software will enable these soldiers to build customized interfaces to such data sources by selecting fields for display on the phone and by extending the information provided by these sources with additional, mission-specific information. This capability will provide mash ups that support soldiers by capturing multiple sources of information for display and manipulation. Once our full capability is available in spring 2012, it will become much easier to build phone interfaces to new data sources and extend these interfaces with additional information. Looking to the Future Currently, eMONTAGE can handle the basic information types that are available on an Android phone, including images, audio, and data. Technologies like finger print readers and chemical sensors are being miniaturized and will likely be incorporated into future handheld devices. With each new technology, we’ll need to add that basic type to our capability. Fortunately, this is a relatively straight-forward programming operation, but it does require engineering expertise. As a new type becomes available, professional engineers will add it to eMONTAGE, thereby making the type available to soldiers who may have little or no programming expertise. Our current focus is on ensuring that the software is reliable and does not fail, but we are also looking to extend it to provide features that we believe are essential, such as better support for collections of objects. For example, soldiers may need to classify a single individual into different groups: a family member, translator, or member of an organization. Each of these groups is a collection. Soldiers will have the ability to list and search through collections (e.g., list all members of an NGO who work for Doctors Without Borders) and plot the members of a collection on a map (e.g., display all members of Doctors Without Borders who are within 10 miles of my current position.) While we can provide access to military iconology, eMONTAGE is not DoD-specific by design. This application can be used by other government organizations—or even non-government organizations— that want a user-customizable way to capture information about any variety of people, places, and things, and share this information effectively in the enterprise. Part of our ongoing research involves testing our applications with soldiers through the Naval Post-Graduate School’s Center for Network Innovation and Experimentation (CENETIX). In our initial tests with the soldiers, they told us what capabilities they need and what did not work. These collaborations tie our work firmly into both the research and military communities and keep us focused on providing a useful and cutting-edge capability. In addition to continuing our collaboration with CENETIX, we are working with Dr. Brad Myers of the Carnegie Mellon University Human Computer Interaction Institute. Dr. Myers is helping us define an appropriate interface for soldiers to use the handheld software in the challenging situations they face. Additional Resources: This posting is the second in a series exploring our research in developing software for soldiers who use handheld devices in tactical networks. To read our first post in the series, please visithttp://blog.sei.cmu.edu/post.cfm/a-new-approach-for-handheld-devices-in-the-military

SEI . Blog .  Jul 27, 2015 02:54pm

Regression Verification for Real-time Embedded Software Systems

By Arie GurfinkelSenior Member of the Technical Staff Research, Technology, & System Solutions The DoD relies heavily on mission- and safety-critical real-time embedded software systems (RTESs), which play a crucial role in controlling systems ranging from airplanes and cars to infusion pumps and microwaves. Since RTESs are often safety-critical, they must undergo an extensive (and often expensive) certification process before deployment. This costly certification process must be repeated after any significant change to the RTES, such as migrating a single-core RTES to a multi-core platform, significant code refactoring, or performance optimizations, to name a few. Our initial approach to reducing re-certification effort—described in a previous blog post—focused on the parts of a system whose behavior was affected by changes using a technique called regression verification, which involves deciding the behavioral equivalence of two, closely related programs. This blog posting describes our latest research in this area, specifically our approach to building regression verification tools and techniques for static analysis of RTESs. Although there are many types of RTESs, we concentrate on a class of periodic programs, which are concurrent programs that consist of tasks that execute periodically. The tasks are assigned priorities based on their frequency (higher frequency = higher priority). The RTES executes the tasks using a priority-based preemptive scheduler. Each execution of a task is called a job. Thus, from the perspective of the scheduler, a system’s execution is a constant periodic stream of jobs of different priorities. In the rest of this post, we use RTES to mean periodic programs. In the beginning of the project, we assumed that automated verification techniques (such as static analysis and model checking) for single-core RTESs could be adapted for regression verification since these techniques have been used for sequential single-core programs. After conducting an initial survey, however, we found that existing automated verification techniques that apply directly to program source (rather than to a manual abstract model) are not applicable to periodic programs. Our original approach to extend static analysis to regression verification in the setting of multi-core RTES was therefore changed in two ways. First, in phase 1 of our project we developed a new static analysis technique for reasoning about bounded executions of periodic programs. Second, in phase 2 we extended regression verification to multi-threaded programs, of which periodic programs are a restricted subset. The remainder of this blog posting describes these two phases. > Phase 1: Time-Bounded Verification of Periodic Programs In the first part of our work, we developed an approach for time-bounded verification of safety properties (user-specified assertions) of periodic programs written in the C programming language. Time-bounded verification is the problem of deciding whether a given program does not violate any user-specified assertions in a given time interval. Time-bounded verification makes sense for RTESs because of their intimate dependence on real time behavior. The inputs to our approach are (1) a periodic program C; (2) a safety property expressed via an assertion A embedded in C; (3) an initial condition Init of C, and (4) a time bound W. The output is either a counter-example trace showing how C violates an assertion A, or a message saying that the program is safe, in the sense that there is no execution that triggers any user-specified assertions. Our solution to time-bounded verification is based on sequentialization, which involves reducing verification of a current program P to verification of a (non-deterministic) sequential program P’. A key feature of our approach is that P’ is linear in the size of P, which means the translation step is not computationally intensive and adds little overhead to the verification effort. The scalability of our approach is therefore mostly driven by the scalability of the underlying analysis engine, and our approach automatically benefits from constant improvements in the verification area. Our work builds upon previous sequentialization work for context-bounded analysis (CBA) and bounded model checking (BMC). Our approach differs from prior work, however, since it bounds the actual execution time of the program, which is more natural to the designer of an RTES than a bound on the number of context switches (as done in CBA) or a bound on the number of instructions executed (as in BMC). We bound the execution time by translating the input time bound W in our model to a bound on the number of jobs. This translation is a natural consequence of the fact that the tasks are periodic and are therefore activated a finite number of times within W. We implemented our approach in a tool called REK. REK supports C programs with tasks, priorities, priority ceiling locks, and shared variables. It takes a concurrent periodic program that cannot be analyzed with standard tools for sequential verification and converts it to become analyzable with such tools. Although in principle REK is compatible with any analyzer for bounded (loop- and recursion-free) C programs, in practice we rely on the CBMC tool by Daniel Kroening, which is one of the first and most mature bounded model checkers for C. CBMC can automatically analyze substantial C programs by encoding assertion violation to Boolean satisfiability queries. CBMC is a mature and robust tool that has been extensively applied to many industrial problems. How REK Works The analysis problem that REK is designed to solve is to check that a given periodic program is safe under all legal scheduling of tasks. REK solves a time-bounded version of this problem, e.g., whether the program is safe in the first 100ms, 200ms, 300ms, etc., starting from some user-specified initial condition. A time-bounded verification makes sense in the context of periodic programs since their execution can be naturally partitioned by time-intervals. Of course, in practice, unbounded verification would be preferred, so we are working on extending REK in this direction. We briefly summarize the sequentialization step done by REK. First, we divide a time-bounded execution into execution rounds (or, rounds for short). The execution starts in round 0, a new round starts (and the old one stops) whenever a job of some task finishes. An execution with X jobs therefore requires X execution rounds. The sequentialization step simulates execution of each round independently and then combines them (using non-deterministic choice) into a single legal execution. Further details of the construction are available in our FMCAD 2011 paper referenced below. In addition to the basic sequentialization described above, REK is extended with the following features to achieve scalability to realistic programs: Partial order reduction is a set of techniques used in model checking to reduce the number of interleavings that must to be explored in a concurrent system. For example, if there are two independent actions a and b, then only one of the two executions ‘a followed by b’ or ‘b followed by a’ must be explored since they both lead to the same destination state. Although there are many approaches for partial order reduction in explicit state model checking (as opposed to symbolic model checking used in this work), extending them to symbolic verification is an area of active research. In REK, we developed a new partial order reduction technique that restricts explored executions only to those in which a read statement is preempted by a write statement to the same variable, or a write is preempted by a read or a write. This reduction eliminates many unnecessary interleavings and cuts the search space significantly. Our experiments show that the reduction is quite effective in practice. A limitation to our approach is that it does not keep track of the actual execution time of each instruction, each job, and each task. As such, it is an over-approximation since it explores more executions than actually possible and can produce a "false positive" by producing a counter-example trace that is not possible on a given hardware architecture due to timing restrictions. To reduce the number of false positives, we further constrain our sequentialization by the information that can be inferred from schedulability analysis. Thus, if a periodic program is schedulable, it satisfies the rate monotonic analysis (RMA) equations. Those equations can be used to compute an upper bound on the number of times any given low priority job can be preempted by any given high priority job. We call this the preemption bound, which REK uses to further reduce the number of interleavings by keeping track how many times one task preempts another, and ensuring that this value never exceeds the preemption bound for the jobs of that task. To deal with practical periodic programs, REK provides support for two types of commonly used lock primitives. In particular, it supports preemption locks (preemptions are disabled when the lock is held) and priority ceiling locks (preemption by any task with lower priority than the lock is disabled when the lock is held). We are extending REK to support the third common type of locks, priority-inheritance locks (regular blocking locks, but the priority of a low-priority task that holds a lock l is increased if a high-priority task is waiting for l). As part of our research, we created a model problem using the NXTway-GS, which is a two-wheeled, self-balancing robot that responds to Bluetooth commands. The robot uses a gyroscope to balance itself upright by applying power to left and right wheels. It also uses a sonar sensor so that when it comes to an obstacle, like a wall or ditch, it can back up. We have used REK to verify and fix several communication consistency properties between the tasks of the robot. More information on the use of REK for the NXTway-GS is available at http://www.andrew.cmu.edu/~arieg/Rek. Phase 2: Regression Verification for Multi-threaded Programs In the second phase of our work, we examined regression verification for multi-threaded programs. We believe that that once we have regression verification for multi-threaded programs, we can adapt it to periodic programs as well. Every instance of regression verification is based on some underlying notion of equivalence. The equivalence notion for single-threaded software is called partial equivalence: two functions are partially equivalent if they produce the same output for the same input. A multi-threaded program, conversely, is not partially equivalent to itself by the above definition since the same input can lead to different outputs due to scheduling choices. Our first challenge therefore involved creating a notion of equivalence for multi-threaded software. Our second challenge was to come up with the right notion of decomposition to establish equivalence of programs from equivalence of their functions. Equivalence of sequential programs is done using Intput/Output equivalence. Two sequential programs are equivalent if it is possible to show that their corresponding functions have the same Input/Output behavior (produce the same output given the same input). In the case of multi-threaded programs, however, functions from different threads of a single program affect one another, making simple decomposition at the level of functions much harder because it must take interference from other threads into account. To check whether two multi-threaded programs are partially equivalent (P = P’) we use a proof rule consisting of a set of premises and a conclusion. Each premise establishes the partial equivalence of a pair of functions f and f’ from P and P’, respectively. A premise is established by verifying a single-threaded program. As part of this work, we developed two separate proof rules: The first rule attempts to show equivalence of two programs by showing that their corresponding functions are Input/Output equivalent (produce the same output for a given input) under arbitrary interference, where "interference" means that the value of shared variables can change between execution of instructions of a thread. This rule is "strong" (not widely applicable on many equivalent programs) because in practice the functions must be equivalent only in the context of the given program and not under arbitrary interference. The second rule improves on the first rule by attempting to show that two programs are equivalent by restricting interference to what is consistent with the other functions in the program. For example, if there is no other function in a program that can affect a global variable ‘x’, then no interference that modifies ‘x’ is considered. This rule is "weaker" (more widely applicable) than the first one, but is computationally harder to automate. In Conclusion The ability to statically reason about correctness of periodic programs and the ability to perform regression verification adds the following key capabilities to an RTES developer’s toolbox: ability to check prior to deployment that the program does not violate its assertions, ability to check that top-level application programming interfaces (APIs) are not affected by low-level refactoring and/or performance optimizations, ability to check that new APIs are backward compatible with old APIs, and ability to perform impact analysis to determine which function may possibly be affected by a given source code change and which unit tests must be repeated. We believe these capabilities can lower the cost of developing RTESs, while increasing their reliability and trustworthiness. Additional Resources For more information about our tool REK and our experiments, please visit http://www.andrew.cmu.edu/user/arieg/Rek For more information about the bounded model checker CBMC, please visit http://www.cprover.org/cbmc B. Goldin and O. Strichman. "Regression Verification," in Proceedings of DAC 2009, pp. 466-471. S. Chaki, A. Gurfinkel, and O. Strichman. "Time-Bounded Analysis of Real-Time Systems," in Proceedings of FMCAD 2011, pp. 72-80. S. Chaki, A. Gurfinkel, and O. Strichman. "Regression Verification for Multi-Threaded Programs," to appear in Proceedings of VMCAI 2012.

SEI . Blog .  Jul 27, 2015 02:52pm

Using Predictive Modeling in Software Development: Results from the Field

By Dennis R. GoldensonSenior Member of the Technical Staff Software Engineering Measurement and Analysis As with any new initiative or tool requiring significant investment, the business value of statistically-based predictive models must be demonstrated before they will see widespread adoption. The SEI Software Engineering Measurement and Analysis (SEMA) initiative has been leading research to better understand how existing analytical and statistical methods can be used successfully and how to determine the value of these methods once they have been applied to the engineering of large-scale software-reliant systems. As part of this effort, the SEI hosted a series of workshops that brought together leaders in the application of measurement and analytical methods in many areas of software and systems engineering. The workshops help identify the technical barriers organizations face when they use advanced measurement and analytical techniques, such as computer modeling and simulation. This post focuses on the technical characteristics and quantified results of models used by organizations at the workshops. Participants were invited and asked to present at the workshops only if they had empirical evidence about the results of their modeling efforts. A key component of this work is assembling leaders within the organizations who know how to conduct measurement and analysis and can demonstrate how it is successfully integrated into the software product development and service delivery processes. Understandably, attendees don’t share proprietary information, but rather talk about the methods that they used, and, most importantly, they learn from each other. At a recent workshop, the various models discussed were statistical, probabilistic, and simulation-based. For example, organizational participants demonstrated the use of Bayesian belief networks and process flow simulation models to define end-to-end software system lifecycle processes requiring coordination among disparate stakeholder groups to meet product quality objectives and efficiency of resource usage, described the use of Rayleigh curve fitting to predict defect discovery (depicted as defect densities by phase) across the software system lifecycle and to predict latent or escaping defects, and described the use of multivariable linear regression and Monte Carlo simulation to predict software system cost and schedule performance based on requirements volatility and the degree of overlap of the requirements and design phases (e.g. surrogate for risk of proceeding with development prematurely). Quantifying the Results The presentations covered many different approaches applied across a large variety of organizations. Some had access to large data repositories, while others used small datasets. Still others addressed issues of coping with missing and imperfect data, as well as the use of expert judgment to calibrate the models. The interim and final performance outcomes predicted by the models also differed considerably, and included defect prevention, customer satisfaction, other quality attributes, aspects of requirements management, return on investment, cost, schedule, efficiency of resource usage, and staff skills as a function of training practices. One case study, presented by David Raffo, professor of business, engineering, and computer science at Portland State University, described an organization releasing defective products with high schedule variance. The organization’s defect-removal activities were based on unit test, where they faced considerable reliability problems. They knew they needed to reduce schedule variance and improve quality, but they had a dozen ideas to consider for how to actually accomplish that. They wanted to base their decision on a quantitative evaluation of the likelihood of success of each particular effort. A state-based discrete event model of large-scale commercial development processes was built to address that and other problems. The simulation was parameterized using actual project data. Some outcomes predicted by the model included the following: cost in staff-months of effort or full-time-equivalent staff used for development, inspections, testing, and rework, numbers of defects by type across the life cycle, delivered defects to the customer, and calendar months of project cycle time. Raffo’s simulation model was used as part of a full business case analysis. The model ultimately determined likely return on investment (ROI) and related financial performance under different proposed process change scenarios. Another example presented by Neal Mackertich and Michael Campo of Raytheon Integrated Defense Systems demonstrated the use of a Monte Carlo simulation model they developed. The model was created to support Raytheon’s goal of developing increasingly complex systems with smaller performance margins. One of their most daunting challenges was schedule pressure. Schedules are often managed deterministically by the task manager, limiting the ability of the organization to assess the risk and opportunity involved, perform sensitivity analysis, and implement strategies for risk mitigation and opportunity capture. The model developed at Raytheon allowed them to statistically predict their likelihood of meeting schedule milestones, identify task drivers based on their contribution to overall cycle time and percentage of time spent on the critical path, and develop strategies for mitigating the identified risk. The primary output of the model was the prediction interval estimate of schedule performance (generated from Monte Carlo simulation) using individual task duration probability estimation and an understanding of the individual task sequence relationships. Engineering process funding was invested in the development and deployment of the model and critical chain project management, resulting in a 15 - 40% reduction in cycle time duration against baseline. Encouraging Adoption While these types of models are used frequently in other fields, they are not as often applied in software engineering, where the focus has often been on the challenges of the system being developed. As the field matures, more analysis should be done to determine quantitatively how products can be built most efficiently and affordably, and how we can best organize ourselves to accomplish that. The initial cost of model development can range from a month or two of staff effort to a year depending on the scope of the modeling effort. Tools can range from $5,000 to $50,000 depending on the level of capability provided. As a result of these kinds of investments, models can and have saved organizations millions of dollars through resultant improvements. Our challenge is to help change the practice of software engineering, where the tendency is to "just go out and do it" to include this type of product and process analysis. To do so, we know we have to conclusively demonstrate that the information gained is worth the expense and bring these results to a wider audience. Additional Resource: To read the SEI technical report, Approaches to Process Performance Modeling: A Summary from the SEI Series of Workshops on CMMI High Maturity Measurement and Analysis, please visit www.sei.cmu.edu/library/abstracts/reports/09tr021.cfm

SEI . Blog .  Jul 27, 2015 02:51pm

A Summary of Key SEI R&D Accomplishments in 2011

By Douglas C. Schmidt Chief Technology Officer A key mission of the SEI is to advance the practice of software engineering and cyber security through research and technology transition to ensure the development and operation of software-reliant Department of Defense (DoD) systems with predictable and improved quality, schedule, and cost. To achieve this mission, the SEI conducts research and development (R&D) activities involving the DoD, federal agencies, industry, and academia. One of my initial blog postings summarized the new and upcoming R&D activities we had planned for 2011. Now that the year is nearly over, this blog posting presents some of the many R&D accomplishments we completed in 2011. Our R&D benefits the DoD and other sponsors by identifying and solving key technical challenges facing developers and managers of current and future software-reliant systems. Our R&D work focuses on the following four major areas of software engineering and cyber security: Innovating software for competitive advantage. This area focuses on producing innovations that revolutionize development of assured software-reliant systems to maintain the U.S. competitive edge in software technologies vital to national security. Securing the cyber infrastructure. This area focuses on enabling informed trust and confidence in using information and communication technology to ensure a securely connected world to protect and sustain vital U.S. cyber assets and services in the face of full-spectrum attacks from sophisticated adversaries. Advancing disciplined methods for engineering software. This area focuses on improving the availability, affordability, and sustainability of software-reliant systems through data-driven models, measurement, and management methods to reduce the cost, acquisition time, and risk of our major defense acquisition programs. Accelerating assured software delivery and sustainment for the mission. This area focuses on ensuring predictable mission performance in the acquisition, operation, and sustainment of software-reliant systems to expedite delivery of technical capabilities to win the current fight. Following is a sampling of the SEI’s R&D accomplishments in each of these areas during 2011 with links to additional information about these projects. Innovating Software for Competitive Advantage Although the SEI advocates software architecture documentation as a software engineering best practice, the specific value of software architecture documentation has not been established empirically. The blog posting Measuring the Impact of Explicit Architecture Documentation describes a research project we conducted to measure and understand the value of software architecture documentation on complex software-reliant systems, focusing on creating architectural documentation for a major subsystem of Apache Hadoop, the Hadoop Distributed File System (HDFS). The SEI has developed algorithms and tools for optimize the performance of cyber-physical systems without compromising their safety. The blog posting Ensuring Safety in Cyber-Physical Systems describes a safe double-booking algorithm that reduces the over-allocation of processing resources needed to ensure the timing behavior of safety-critical tasks in cyber-physical systems. A subsequent posting describes an algorithm for supporting mixed-criticality operations by giving more central processing unit (CPU) time to functions with higher value while ensuring critical timing guarantees. Together with researchers at CMU, the SEI has worked to develop cloudlets, which are localized, lightweight servers running one or more virtual machines on which soldiers can offload expensive computations from their handheld mobile devices, thereby providing greater processing capacity and helping conserve battery power. The blog posting Cloud Computing for the Battlefield describes a cloudlet prototype the SEI developed to recognize faces on an Android smartphone. A subsequent posting describes how the SEI is using cloudlets to help soldiers perform other mission capabilities more effectively, including speech and imaging recognition, as well as decision making and mission planning. SEI-developed methods and tools allow soldier end-users to program their smartphones to provide an interface tailored to the information they need for a specific mission. The blog posting A New Approach for Handheld Devices in the Military motivates the need for soldiers to access information on a handheld device and described software we are developing to enable soldiers to tailor the information for a given mission or situation. A subsequent blog posting describes the challenges the SEI encountered when equipping soldiers with end-user programming tools. Other SEI-developed methods and tools help reduce the time and effort needed to re-certify mission- and safety-critical real-time embedded software systems (RTESs) after significant changes have be made, such as migrating a single-core RTES to a multi-core platform, significant code refactoring, or performance optimizations. The blog posting on Regression Verification of Real-time Embedded Software focuses on research in applying regression verification (which involves deciding the behavioral equivalence of two closely related programs) to help the migration of RTESs from single-core to multi-core platforms. A subsequent posting describes regression verification tools and techniques that the SEI is building to conduct static analysis of RTESs. Securing the Cyber Infrastructure A large percentage of cybersecurity attacks against DoD and other government organizations are caused by disgruntled, greedy, or subversive insiders, employees, or contractors with access to that organization’s network systems or data. The blog posting Protecting Against Insider Threads with Enterprise Architecture Patterns describes work that researchers at the CERT® Insider Threat Center have been conducting to help protect next-generation DoD enterprise systems against insider threats by capturing, validating, and applying enterprise architectural patterns. These patterns can be used to ensure that the necessary agreements are in place (IP ownership and consent to monitoring), critical IP is identified, key departing insiders are monitored, and the necessary communication among departments takes place to mitigate the impact of insider threats. The SEI has been conducting research to help organizational leaders manage critical services in the presence of disruption by presenting objectives and strategic measures for operational resilience, as well as tools to help them select and define those measures. The blog posting Measures for Managing Operational Resilience describes how the SEI has been exploring the topic of managing operational resilience at the organizational level for the past seven years through development and use of the CERT Resilience Management Model (CERT-RMM). The CERT-RMM is a capability model designed to establish the convergence of operational risk and resilience management activities and apply a capability level scale that expresses increasing levels of process performance. New malicious code analysis techniques and tools being developed at the SEI will better counter and exploit adversarial use of information and communication technologies. The blog posting Fuzzy Hashing Techniques in Applied Malware Analysis describes a technique the SEI has developed to help analysts determine whether two pieces of suspected malware are similar. A subsequent posting discusses types of malware against which similarity measures of any kind (including fuzzy hashing) may be applied. Other blog postings on Learning a Portfolio-Based Checker for Provenance-Similarity of Binaries and Using Machine Learning to Detect Malware Similarity describe our research on using classification (a form of machine learning) to detect "provenance similarities" in binaries, which means that they have been compiled from similar source code (e.g., differing by only minor revisions) and with similar compilers (e.g., different versions of Microsoft Visual C++ or different levels of optimization). Yet another blog posting A New Approach to Modeling Malware using Sparse Representation describes our use of suffix trees, zero-suppressed binary decision diagrams, and sparse representation modeling to create a rapid search capability that allows analysts to quickly analyze a new piece of malware. Advancing Disciplined Methods for Engineering Software Recent SEI research aims to improve the accuracy of early estimates (whether for a DoD acquisition program or commercial product development) and ease the burden of additional re-estimations during a program’s lifecycle. The blog posting Improving the Accuracy of Early Cost Estimates for Software-Reliant Systems describes challenges we have observed trying to accurately estimate software effort and cost in DoD acquisition programs, as well as other product development organizations. A subsequent post explores a method and tools the SEI is developing to help cost estimation experts get the right information into a familiar and usable form for producing high quality cost estimates early in the lifecycle. A notable new approach at the SEI combines elements of the SEI’s Architecture Centric Engineering (ACE) method, which requires effective use of software architecture to guide system development, with its Team Software Process (TSP), which is a team-centric approach to developing software that enables organizations to better plan and measure their work and improve software development productivity to gain greater confidence in quality and cost estimates. The blog postings Combining Architecture-Centric Engineering Within TSP and Using TSP to Architect a New Trading System describe how ACE was applied within the context of TSP to develop system architecture to create a reliable and fast new trading system for Groupo Bolsa Mexicana de Valores (BMV, the Mexican Stock Exchange). Over the last several years, the SEI hosted a series of workshops that brought together leaders in the application of measurement and analytical methods in many areas of software and systems engineering. The workshops helped identify the technical barriers organizations face when they use advanced measurement and analytical techniques, such as computer modeling and simulation. The blog posting on Using Predictive Modeling in Software Development: Results from the Field describes the technical characteristics and quantified results of models used by organizations at the workshops. Accelerating Assured Software Delivery and Sustainment for the Mission The SEI has been assisting large-scale DoD acquisition programs in developing systematically reusable software platforms that provide applications and end-users with many net-centric capabilities, such as cloud computing or Web 2.0 applications. The blog posting A Framework for Evaluating Common Operating Environments explains how the SEI developed a Software Evaluation Framework and applied it to help assess the suitability of common operating environments for the U.S. Army. Methods and processes that enable large-scale software-reliant DoD systems to innovate rapidly and adapt products and systems to emerging needs within compressed time frames were another area of exploration for the SEI. A series of blog postings details our research on improving the overall value delivered to users by strategically managing technical debt, which involves decisions made to defer necessary work during the planning or execution of a software project, as well as describing the level of skill needed to develop software using Agile for DoD acquisition programs and the importance of maintaining strong competency in a core set of software engineering processes. Teams at the SEI also have been researching common problems faced by acquisition programs related to the development of IT systems, including communications, command, and control; avionics; and electronic warfare systems. A series of blog postings covers acquisition problems, such as misaligned incentives, which occur when different individuals, groups, or divisions are rewarded for behaviors that conflict with a common organizational goal the need to sell the program, which describes a situation in which people involved with acquisition programs have strong incentives to "sell" those programs to their management, sponsors, and other stakeholders so that they can obtain funding, get them off the ground, and keep them sold the evolution of "science projects," which describes how prototype projects that unexpectedly grow in size and scope during development often have difficulty transitioning into a formal acquisition program, and the tragedy of common infrastructure and joint programs, which arises when multiple organizations attempt to cooperate in the development of a single system, infrastructure, or capability that will be used and shared by all parties. The SEI also developed a collaborative method for engineering systems with critical safety and security ramifications. A series of blog postings on this topic explores problems with safety and security requirements, examines key obstacles that acquisition and development organizations encounter concerning safety- and security-related requirements, and explains how the Engineering Safety- and Security-related Requirements (ESSR) method overcomes these obstacles. Concluding Remarks As you can see from the summary of accomplishments above, 2011 has been a highly productive and exciting year for the SEI R&D staff. Naturally, this blog posting just scratches the surface of SEI R&D activities. Please come back regularly to the SEI blog for coverage of these and many other topics we’ll be doing in 2012. As always, we’re interested in new insights and new opportunities to partner on emerging technologies and interests. We welcome your feedback and look forward to engaging with you on the blog; as always we invite your comments below.

SEI . Blog .  Jul 27, 2015 02:50pm

The Road Ahead for SEI R&D in 2012

By Douglas C. SchmidtChief Technology Officer After 47 weeks and 50 blog postings, the sands of time are quickly running out in 2011. Last week’s blog posting summarized key 2011 SEI R&D accomplishments in our four major areas of software engineering and cyber security: innovating software for competitive advantage, securing the cyber infrastructure, accelerating assured software delivery and sustainment for the mission, and advancing disciplined methods for engineering software. This week’s blog posting presents a preview of some upcoming blog postings you’ll read about in these areas during 2012. Innovating Software for Competitive Advantage The Value-Driven Incremental Development team is creating quantitative engineering techniques to support rapid delivery of high-value, high-quality software capabilities to the DoD. Their approach is based on quality attribute analysis models that guide incremental development so that DoD acquisition program offices will be able to get warfighters the features they need most, when they need them, while balancing speed-of-delivery, quality, value, and cost tradeoffs. The Cyber-Physical Systems team is developing algorithms and verification techniques that enable the DoD to deliver reliable mission-critical capability cost-effectively by automating more of the development and assurance of cyber-physical embedded control systems. Their approach is based on new algorithms for precise and scalable functional analysis of real-time systems by exploiting scheduling constraints, as well as new resource reclamation algorithms for multi-threaded tasks in multi-core processors. The Socio-Adaptive Systems team is establishing a new class of adaptive socio-technical systems wherein people, networks, and computer applications can locally decide how to respond when the demand for resources (network resources in this case) outstrips supply, while ensuring the best global use of whatever capacity is available. Their research combines the adaptability of human social institutions—in particular those based in market institutions—with automated network-resource optimization so that scarce tactical network capacity will automatically, continuously, and effectively be allocated to warfighters based on their needs. The Edge-Enabled Tactical System team is improving the quality and relevance of information available to dismounted (edge) warfighters so the information they receive will be more consistent with and useful for their current missions. They are developing model-driven techniques and tools that will enable tactical units (e.g., squads of soldiers) to consume less battery power, computation, and bandwidth resources when performing their missions. Securing the Cyber Infrastructure The CERT Secure Coding Initiative is conducting research to reduce the number of software vulnerabilities to a level that can be mitigated in DoD operational environments. This work focuses on static and dynamic analysis tools, secure coding patterns, and scalable conformance testing techniques that help prevent coding errors or discover and eliminate security flaws during implementation and testing. The CERT Insider Threat team is evaluating techniques for detecting known insider threats prior to attack, to assist the DoD in preventing future high-impact data loss. This work is leveraging the hundreds of cases in the CERT Insider Threat Database, simulation capacity in CERT’s Insider Threat Laboratory, and system dynamics models of insider crime to create the socio-technical architectural foundations to prevent this kind of damage now and into the future. The CERT Coordination Center is developing methods and tools to reduce the cost to DoD suppliers and acquirers of improving software assurance and reliability during development and testing. Their aim is to enable these groups to identify software defects via dynamic blackbox "fuzz testing" in a manner identical to what an attacker would be able to perform, to remediate these vulnerabilities before the software is deployed operationally to the DoD. The CERT Malicious Code team is developing tools to analyze obfuscated malware code to enable analysts to more quickly derive the insights required to protect and respond to intrusions of DoD and other government systems. Their approach uses semantic code analysis to de-obfuscate binary malware to a simple intermediate representation and then convert the intermediate representation back to readable binary that can be inspected by existing malware tools. Accelerating Assured Software Delivery and Sustainment for the Mission The Alternative Methods group is researching methods for increasing adoption of incremental development methods to accelerate delivery of software-related technical capabilities while reducing the cost, acquisition time and risk of major defense acquisition programs. Their approach focuses on developing a contingency model that identifies conditions and thresholds for when and how to use incremental development approaches in a DoD acquisition context. They are also documenting incremental development patterns and guidelines that chart the course for removing barriers to effective adoption of incremental and iterative approaches in the DoD. The Acquisition Dynamics team is evaluating methods that mitigate the effects of misaligned acquisition program organizational incentives and adverse software-reliant acquisition structural dynamics by improving program decision-making. Their objective is to help DoD acquisition programs overcome some of the most severe counter-productive behaviors that stem from inherent social dilemmas by using known solutions drawn from fields such as behavioral economics, and thus deploy higher-quality systems to the field in a more timely and cost-effective manner. Advancing Disciplined Methods for Engineering Software The Software Engineering Measurement and Analysis group is developing methods and tools for modeling uncertainties for pre-milestone A cost estimates to minimize the occurrence of severe acquisition program cost overruns due to poor estimates. Their approach involves synthesizing Bayesian belief network modeling and Monte Carlo simulation to model uncertainties among program change drivers, allow subjective inputs, visually depict influential relationships and outputs to aid team-based model development, and assist with the explicit description and documentation underlying an estimate. Concluding Remarks This concludes our blog postings for 2011. It’s been my great pleasure and privilege to work with the technical staff at the SEI this year to better acquaint you with the SEI body of work. We’ve enjoyed reading your comments and hope that you’ve learned more about the R&D activities that we’re pursuing. We wish all of you a happy holiday season and look forward to hearing from you in 2012.