By Douglas C. Schmidt Principal Researcher As part of our mission to advance the practice of software engineering and cybersecurity through research and technology transition, our work focuses on ensuring that software-reliant systems are developed and operated with predictable and improved quality, schedule, and cost. To achieve this mission, the SEI conducts research and development activities involving the Department of Defense (DoD), federal agencies, industry, and academia. As we look back on 2013, this blog posting highlights our many R&D accomplishments during the past year. Before turning to our accomplishments, it’s important to note that 2013 brought the arrival of Kevin Fall as deputy director and chief technology officer. In the blog post, A New CTO and Technical Strategy for the SEI, Fall provided some background on his experience, as well as his technical goals for the SEI: Develop an even higher quality and more cohesive research program Increase collaboration with Carnegie Mellon University and other academic researchers Enhance accessibility to the SEI’s work Kevin leads R&D at the SEI, which benefits the DoD and other sponsors by identifying and solving key technical challenges facing developers and managers of current and future software-reliant systems. The R&D work at the SEI presented in this blog focused on a range of software engineering and cybersecurity areas, including Securing the cyber infrastructure. This area focuses on enabling informed trust and confidence in using information and communication technology to ensure a securely connected world to protect and sustain vital U.S. cyber assets and services in the face of full-spectrum attacks from sophisticated adversaries. Advancing disciplined methods for engineering software. This area focuses on improving the availability, affordability, and sustainability of software-reliant systems through data-driven models, measurement, and management methods to reduce the cost, acquisition time, and risk of our major defense acquisition programs. Accelerating assured software delivery and sustainment for the mission. This area focuses on ensuring predictable mission performance in the acquisition, operation, and sustainment of software-reliant systems to expedite delivery of technical capabilities to win the current fight. Innovating software for competitive and tactical advantage. This area focuses on safety-critical avionics, aerospace, medical, and automotive systems, all of which are becoming increasingly reliant on software. Other posts in this area highlight innovations that revolutionize development of assured software-reliant systems to maintain the U.S. competitive and tactical edge in software technologies vital to national security. What follows is a sampling of the SEI’s R&D accomplishments in each of these areas during 2013, with links to additional information about these projects. Securing the Cyber Infrastructure Some cybersecurity attacks against DoD and other government organizations are caused by disgruntled, greedy, or subversive insiders, employees, or contractors with access to that organization’s network systems or data. Over the past 13 years, researchers at the CERT Insider Threat Center have collected incidents related to malicious activity by insiders from a number of sources, including media reports, the courts, the United States Secret Service, victim organizations, and interviews with convicted felons. In a series of blog posts, members of the research team have presented some of the 26 patterns identified by analyzing the insider threat database. Through our analysis, insider threat researchers have identified more than 100 categories of weaknesses in systems, processes, people, or technologies that allowed insider threats to occur. One aspect of their research focuses on identifying enterprise architecture patterns that organizations can use to protect their systems from malicious insiders. Now that we’ve developed 26 patterns, our next priority is to assemble these patterns into a pattern language that organizations can use to bolster their resources and make them more resilient against insider threats. The blog post, A Multi-Dimensional Approach to Insider Threat, is the third installment in a series that described research to create and validate an insider threat mitigation pattern language to help organizations balance the cost of security controls with the risk of insider compromise. Exposed vulnerable assets make a network a target of opportunity, or low-hanging fruit for attackers. According to the 2012 Data Breach Investigations Report, of the 855 incidents of corporate data theft reported in 2012, 174 million records were compromised. Of that figure, 79 percent of victims were targets of opportunity because they had an easily exploitable weakness, according to the report. The blog post Network Profiling Using Flow highlighted recent research in how a network administrator can use network flow data to create a profile of externally-facing assets on mid- to large-sized networks. New malicious code analysis techniques and tools being developed at the SEI will better counter and exploit adversarial use of information and communication technologies. Through our work in cybersecurity, we have amassed millions of pieces of malicious software in a large malware database. Analyzing this code manually for potential similarities and identifying malware provenance is a painstaking process. The blog post Prioritizing Malware Analysis outlined a research collaborative with CMU’s Robotics Institute aimed at developing an approach to prioritizing malware samples in an analyst’s queue (allowing analysts to home in on the most destructive malware first) based on the file’s execution behavior. Another blog post, Semantic Comparison of Malware Functions, described research aimed at helping analysts derive precise and timely actionable intelligence to understand and respond to malware. The approach described in the post uses the semantics of programming languages to determine the origin of malware. The blog post Analyzing Routing Tables highlighted another aspect of our work in cybersecurity.  The post detailed maps that a CERT researcher developed using Border Gateway Protocol (BGP) routing tables to show the evolution of public-facing autonomous system numbers (ASNs). These maps help analysts inspect the BPG routing tables to reveal disruptions to an organization’s infrastructure. They also help analysts glean geopolitical information for an organization, country, or a city-state, which helps them identify how and when network traffic is subverted to travel nefarious alternative paths to place communications deliberately at risk. Exclusively technical approaches toward attaining cybersecurity have created pressures for malware attackers to evolve technical sophistication and harden attacks with increased precision, including socially engineered malware and distributed denial of service (DDoS) attacks. A general and simple design for achieving cybersecurity remains elusive, and addressing the problem of malware has become such a monumental task that technological, economic, and social forces must join together to address this problem. The blog post Deterrence for Malware: Towards a Deception-Free Internet, detailed a collaboration between the SEI’s CERT Division and researchers at the Courant Institute of Mathematical Sciences at New York University. Through this collaboration, researchers aim to understand and seek complex patterns in malicious use cases within the context of security systems and develop an incentives-based measurement system that would evaluate software and ensure a level of resilience to attack. Our security experts in the CERT Division are often called upon to audit software and provide expertise on secure coding practices. The blog posting Using the Pointer Ownership Model to Secure Memory Management in C and C++, described a research initiative aimed at eliminating vulnerabilities resulting from memory management problems in C and C++.  Memory problems in C and C++ can lead to serious software vulnerabilities including difficulty fixing bugs, performance impediments, program crashes (including null pointer deference and out-of-memory errors), and remote code execution. Advancing Disciplined Methods for Engineering Software New data sources, ranging from diverse business transactions to social media, high-resolution sensors, and the Internet of Things, are creating a digital tidal wave of big data that must be captured, processed, integrated, analyzed, and archived. Big data systems that store and analyze petabytes of data are becoming increasingly common in many application areas. These systems represent major, long-term investments requiring considerable financial commitments and massive scale software and system deployments. With analysts estimating data storage growth at 30 to 60 percent per year, organizations must develop a long-term strategy to address the challenge of managing projects that analyze exponentially growing data sets with predictable, linear costs. The blog post, Addressing the Software Engineering Challenges of Big Data, described a lightweight risk reduction approach called Lightweight Evaluation and Architecture Prototyping (for Big Data). The approach is based on principles drawn from proven architecture and technology analysis and evaluation techniques to help the DoD and other enterprises develop and evolve systems to manage big data. The post Architecting Systems of the Future is the first in a series highlighting work from the SEI’s newest program, the Emerging Technology Center. This post highlighted research aimed at creating a software library that can exploit the heterogeneous parallel computers of the future and allow developers to create systems that are more efficient in terms of computation and power consumption.  Accelerating Assured Software Delivery and Sustainment for the Mission SEI researchers work with acquisition professionals and system integrators to develop methods and processes that enable large-scale software-reliant government systems to innovate rapidly and adapt products and systems to emerging needs within compressed time frames and within constrained budgets. To deliver enhanced integrated warfighting capability at lower cost across the enterprise and over the lifecycle, the DoD must move away from stove-piped solutions and towards a limited number of technical reference frameworks based on reusable hardware and software components and services. There have been previous efforts in this direction, but in an era of sequestration and austerity, the DoD has reinvigorated its efforts to identify effective methods of creating more affordable acquisition choices and reducing the cycle time for initial acquisition and new technology insertion. In 2013, we published two postings as part of an ongoing series on Open Systems Architecture (OSA) that focused on: Affordable Combat Systems in the Age of Sequestration expanded upon earlier coverage of how acquisition professionals and system integrators can apply OSA practices to decompose large monolithic business and technical designs into manageable, capability-oriented frameworks that can integrate innovation more rapidly and lower total ownership costs. The Architectural Evolution of DoD Combat Systems described the evolution of DoD combat systems from ad hoc stovepipes to more modular and layered architectures. Despite substantial advances in technical reference frameworks during the past decade, widespread adoption of affordable and dependable OSA-based solutions has remained elusive. It is therefore important to look at past open-systems efforts across the DoD to understand what worked, what hasn’t, and what can be done to make the development of systems more successful in the future. Government agencies, including the departments of Defense, Veteran Affairs, and Treasury, are being asked by their government program offices to adopt Agile methods. These organizations have traditionally used a waterfall life cycle model (as epitomized by engineering "V" charts). Programming teams in these organizations are accustomed to being managed via a series of document-centric technical reviews that focus on the evolution of the artifacts that describe the requirements and design of the system rather than its evolving implementation, as is more common with Agile methods. As a result of the factors outlined above, many organizations struggle to adopt Agile practices. For example, acquisition professionals often wonder how to fit Agile measurement practices into their progress tracking systems. They also find it hard to prepare for technical reviews that don’t account for both implementation artifacts and the availability of requirements/design artifacts. A team of SEI researchers is dedicated to helping government programs prepare for and, if appropriate, implement Agile. In 2013, the SEI continued its series of blog posts on the Readiness & Fit Analysis (RFA) approach, which helps organizations understand the risks involved when contemplating or embarking on the adoption of new practices, in this case Agile methods. Blog installments published in the series thus far outlined factors to study when considering agile adoption including business and acquisition (discussed in the first post in this series) organizational climate (discussed in the second post and continued in the third post) project and customer environment (discussed in the fourth post) The verification and validation of requirements are a critical part of systems and software engineering. The importance of verification and validation (especially testing) is a major reason that the traditional waterfall development cycle underwent a minor modification to create the V model that links early development activities to their corresponding later testing activities. A blog post published in November introduced three variants on the V model of system or software development that make it more useful to testers, quality engineers, and other stakeholders interested in the use of testing as a verification and validation method. A widely cited study for the National Institute of Standards & Technology (NIST) reports that inadequate testing methods and tools annually cost the U.S. economy between $22.2 billion and $59.5 billion, with roughly half of these costs borne by software developers in the form of extra testing and half by software users in the form of failure avoidance and mitigation efforts. The same study notes that between 25 percent and 90 percent of software development budgets are often spent on testing. In April, we kicked off a series on common testing problems that highlighted results of an analysis that documents problems that commonly occur during testing. Specifically, this series of posts identifies and describes 77 testing problems organized into 14 categories; lists potential symptoms by which each can be recognized, potential negative consequences, and potential causes; and makes recommendations for preventing them or mitigating their effects. The first post in the series explored issues surrounding the reality that software testing is less effective, less efficient, and more expensive than it should be. The second posting highlighted results of an analysis that documents problems that commonly occur during testing. Innovating Software for Competitive and Tactical Advantage Mission- and safety-critical avionics, aerospace, defense, medical, and automotive systems are increasingly reliant on software. Malfunctions in these systems can have significant consequences including mission failure and loss of life, so they must be designed, verified, and validated carefully to ensure that they comply with system specifications and requirements and are error free.  Ensuring these properties in a timely and cost-effective manner is also vital to ensure competitive advantage for companies who produce these technologies. In March, we kicked off a series of blog posts that explored recent developments with the Architecture Analysis Design Language (AADL) standard, which provides formal modeling concepts for the description and analysis of application systems architecture in terms of distinct components and their interactions. The series aimed to highlight how the use of AADL helps alleviate mismatched assumptions between the hardware, software, and their interactions that can lead to system failures. The series has included the following posts thus far Detecting Architecture Traps and Pitfalls in Safety-Critical Software highlighted an effort at the SEI that aims to help engineers use time-proven architecture patterns (such as the publish-subscribe pattern or correct use of shared resources) and validate their correct application. AADL: SAVI and Beyond described the use of AADL in the aerospace industry to improve safety and reliability. AADL in the Medical Domain detailed how AADL is being used in medical devices and highlights the experiences of a practitioner whose research aims to address problems with medical infusion pumps.  AADL Tools: Leveraging the Ecosystem provided an overview of existing AADL tools and highlights the experience of researchers and practitioners who are developing and applying AADL tools to production projects.  Introduction to the Architecture Analysis and Design Language, the first post in the series, detailed the initial foundations of AADL, which defines a modeling notation based on a textual and graphic representation that is used by development organizations to conduct lightweight, rigorous—yet comparatively inexpensive—analyses of critical real-time factors, such as performance, dependability, security, and data integrity. Another post highlighting our work on safety-critical systems introduced the Reliability Validation and Improvement Framework that will lead to early defect discovery and incremental end-to-end validation. The Advanced Mobile Systems Initiative at the SEI focuses on helping soldiers and first responders, whether they are in a tactical environment (such as a war zone) or responding to a natural disaster. Both scenarios lack effective, context-aware use and adaptation of tactical resources and the ability to get relevant information when they critically need it. Software and system capabilities do not keep pace with these users’ changing needs and must be adapted at the operational edge, or periphery, of the network. Posts describing research in this area include the following Situational Awareness Mashups at the Tactical Edge detailed efforts to create the Edge Mission-Oriented Tactical App Generator (eMontage), a software prototype that allows warfighters and first responders to rapidly integrate geotagged situational awareness data from multiple remote data sources. National Deployment of the Wireless Emergency Alerts Systems described how the SEI’s work on architecture, integration, network security, and project management is assisting in implementing the WEA system, so it can handle a large number of alert originators and provide an effective nationwide wireless emergency warning system. Building Next-generation Autonomous Systems focused on a new research effort at the SEI called Self-governing Mobile Ad-hocs with Sensors and Handhelds (SMASH) that is forging collaborations with researchers, professors, and students with the goal of enabling more effective search-and-rescue crews.  Application Virtualization for Cloudlet-based Cyber Foraging at the Edge is the latest in a series that recounted research aimed at exploring the applicability of application virtualization as a strategy for cyber-foraging in resource-constrained environments. Concluding Remarks As you can see from this summary of accomplishments, 2013 has been a highly productive and exciting year for the SEI technical staff. Moreover, this blog posting just scratches the surface of SEI R&D activities. Please come back regularly to the SEI Blog for coverage of these and many other topics we’ll be doing in the coming year. As always, we’re interested in new insights and new opportunities to partner on emerging technologies and interests. We welcome your feedback and look forward to engaging with you on the blog, so please feel free to add your comments below. Additional Resources For the latest SEI technical reports and papers, please visitwww.sei.cmu.edu/library/reportspapers.cfmFor more information about R&D at the SEI as well as opportunities for collaboration, please visitwww.sei.cmu.edu/research/
SEI   .   Blog   .   <span class='date ' tip=''><i class='icon-time'></i>&nbsp;Jul 27, 2015 02:17pm</span>
Second in a Series By Charles B. WeinstockDistinguished Principal ResearcherSoftware Solutions Division Software used in safety-critical systems—such as automotive, avionics, and healthcare applications, where failures could result in serious harm or loss of life—must perform as prescribed. How can software developers and programmers offer assurance that the system will perform as needed and with what level of confidence? In the first post in this series, I introduced the concept of the assurance case as a means to justify safety, security, or reliability claims by relating evidence to the claim via an argument. In this post I will discuss Baconian probability and eliminative induction, which are concepts we use to explore properties of confidence that the assurance case adequately justifies its claim about the subject system. The Case for Confidence Assurance cases are now becoming a standard practice in the development and deployment of safety-critical systems. In May 2013, at the First International Workshop on Assurance Cases for Software-Intensive Systems, it was stated that Several certification standards and guidelines, e.g., in the defense, transportation (aviation, automotive, rail), and healthcare domains, now recommend and/or mandate the development of assurance cases for software-intensive systems. In the first post in this series, I discussed our research on assurance cases and how they can be used when developing software-intensive systems. It is not enough, however, to simply have an assurance case. It is also important to understand why you should have confidence in the assurance case. Achieving confidence is important to all stakeholders in the subject system, including those acquiring the system, those producing the system, those using the system, and (in certain cases) those certifying or otherwise evaluating the system. It is impossible to examine every possible circumstance covered by an assurance case.  One approach to the problem of achieving confidence in an assurance case is to provide a parallel argument that gives reasons why the assurance case should be believed. Hawkins, Kelly, Knight and Graydon have taken this approach with their concept of the "confidence case." We have taken a different approach. Consider the following notional assurance case arguing that the system is safe (taken from the first posting in this series):   Figure 1: A Notional Assurance Case We want to be able to understand how the evidence (Ev1…Ev3) leads to confidence that the claim (C1) holds and by how much. We would also like to understand what "confidence" in the claim means, and how existing confidence can be increased. The remainder of this post will focus on a more detailed description of Baconian probability and eliminative induction. Two Forms of Induction Deciding how to use evidence as a means of evaluating belief in a hypothesis is a classic philosophical problem—one that is classically solved through the use of induction. When we talk about induction, we typically mean enumerative induction. But there is another form of induction, eliminative induction.In enumerative induction, support for a hypothesis is increased every time confirming evidence is found. Past experience forms a basis for an expectation of future behavior. For example, consider a person coming into a room and turning on a light. If the light has gone on before, the person will have some confidence that it will go on again. If it has gone on many times before, the person should have even higher confidence that it will go on the next time. Eliminative induction in contrast to enumerative induction looks for reasons to doubt the hypothesis. So our person deciding whether to believe a light will go on when he throws the switch will have to consider reasons why it might not, and then eliminate them. When the person walks into the room, he may have some initial, perhaps unfounded, confidence that the light will go on when he throws the switch. If he checks the wiring and finds that the switch is connected, he will have more confidence. If he verifies that there is power to the switch, he will have still more confidence. And, if he shakes the bulb and does not hear a rattle, he will have even more confidence that the light will go on. Eliminative Argumentation and the Confidence Map We can now begin to answer the questions posed in our notional assurance case shown in Figure 1. The evidence in the case leads to increased confidence in the claim to the extent that it removes doubts about the claim. We will achieve increased confidence in the claim C1 as we eliminate more doubts. This method is the idea behind what we call eliminative argumentation, which is based on assurance cases, eliminative induction, and defeasible reasoning.Reasoning about doubts in an argument is a use of defeasible reasoning. A defeasible argument is always subject to revision as we learn more about the claims and evidence that form the argument. There are only three kinds of doubts relevant to an assurance case: we can doubt claims, we can doubt evidence, and we can doubt the inference between claims or between two claims or between a claim and its supporting evidence. We show these doubts explicitly in a modified assurance case— a confidence map. Figure 2: The "light turns on" assurance case fragment Figure 2 is a portion of the assurance case for the claim that the light turns on when the switch is thrown. Notice that each of the three sub-claims of C1.1 is based on a doubt about why the light might not turn on when the switch is thrown. In a confidence map these doubts are expressed explicitly. Figure 3: Rebutting defeaters Figure 3 shows the beginning of a confidence map for the "light turns on" example. R2.1 … R2.3 are called rebutting defeaters. They attack the validity of claim C1.1. If any of these defeaters is true, then the claim is invalid. If all of the defeaters can be eliminated, then we have no reason to doubt the claim. If we have no knowledge about the truth or falsity of one or more of these defeaters, then there remains a doubt as to whether claim C1.1 is valid.There is an implicit inference rule in the confidence map shown above; namely, if all of the rebutting defeaters are shown to be false, then the claim is valid. It may be that there are additional, as yet unidentified, rebutting defeaters to claim C1.1. In the confidence map notation we make this explicit as shown in Figure 4. Figure 4: Adding an inference rule and attacking it via an undercutting defeater The confidence map now makes the inference explicit and adds an additional undercutting defeater, namely UC3.3, which raises a doubt about the sufficiency of the inference rule. Figure 5: Attacking the evidence via an undermining defeater Figure 5 expands one leg of the confidence map to show the evidence that is being used to eliminate defeater R2.3, namely that the bulb does not rattle when shaken. A rattle would indicate a broken filament. The figure also shows an example of an undermining defeater—one that questions the evidence by suggesting that the person shaking the bulb may be hard of hearing. It also includes a second inference rule (that if a bulb does not rattle then it is good), which is undercut by a defeater suggesting that the bulb may not be incandescent, and therefore the absence of a rattle may be meaningless. The grey circle under the undercutting defeater indicates that we have assumed that the bulb is, in fact, incandescent and will consider that defeater to be eliminated without any further analysis. Of course this does not represent a complete confidence map, even for the elimination of R2.3. For instance the bulb may not rattle when shaken because the filament did not burn out, but merely was defective. Measuring Confidence Figure 6: Counting confidence At the beginning of this post, I stated that one of our goals was to measure confidence. One way to go about this is to simply count defeaters near the leaves of the map and then count uneliminated defeaters. In Figure 6, I recap the confidence map as discussed. In this example, we have eliminated one defeater (UC4.2 as indicated by the grey circle) out of a total of five. Expressed as 1|5, this is an example of Baconian probability (named after Sir Francis Bacon, who first proposed the idea of measuring degrees of certainty). Note that the Baconian probability is not a ratio and cannot be reduced. 1|5 is read "one out of five" while 2|10 would be read "two out of ten" a very different level of doubt: in the first case there are four unresolved doubts, whereas in the second there are eight. We can increase our confidence in C1.1 by eliminating additional defeaters. For instance, if the examiner has recently undergone a hearing test, we could eliminate UM4.1. Conclusion This blog post discusses achieving confidence in a claim and introduces the idea of defeaters and confidence maps. To review, doubt in an argument can result from uncertainty about the claims, evidence, or inferences. Such uncertainty is represented by a defeater. A rebutting defeater attacks a claim by suggesting a reason why it can be false. An undermining defeater attacks the evidence and asks why the evidence may be compromised. Finally, an undercutting defeater attacks the inference rule and asks how the premise can be OK, but the conclusion uncertain. Confidence in a claim increases as the reasons for doubt are eliminated. If we have eliminated zero defeaters, then we have no reason to have confidence in the claim. If we have eliminated all of the defeaters, then we have no reason to doubt the claim. But if there are still uneliminated defeaters, then there remains some doubt about the claim. It may be hard to completely eliminate a defeater. Suppose we know little about the light bulb examiner, but we know that 1 percent of the population as a whole will have trouble hearing a rattle in a defective light bulb. Shouldn’t we be able to mostly eliminate defeater UM4.1 on that basis? That and related topics will be discussed in the next article in this series. Additional Resources To read the SEI paper Measuring Assurance Case Confidence using Baconian Probabilities, please visit http://www.sei.cmu.edu/library/assets/whitepapers/icsews13assure-id5-p-16156-preprint.pdf To read the SEI paper Eliminative Induction: A Basis for Arguing System Confidence, please visit http://www.sei.cmu.edu/library/assets/whitepapers/icse13nier-id10-p-15833-preprint.pdf To read the SEI Technical Report, Toward a Theory of Assurance Case Confidence, please visit http://www.sei.cmu.edu/reports/12tr002.pdf To read the Hawkins, Kelly, Knight, and Graydon paper A New Approach to Creating Clear Safety Arguments, please visit http://www-users.cs.york.ac.uk/~rhawkins/papers/HawkinsSSS11.pdf
SEI   .   Blog   .   <span class='date ' tip=''><i class='icon-time'></i>&nbsp;Jul 27, 2015 02:17pm</span>
By Ian GortonSenior Member of the Technical StaffSoftware Solutions Division Many types of software systems, including big data applications, lend them themselves to highly incremental and iterative development approaches. In essence, system requirements are addressed in small batches, enabling the delivery of functional releases of the system at the end of every increment, typically once a month. The advantages of this approach are many and varied. Perhaps foremost is the fact that it constantly forces the validation of requirements and designs before too much progress is made in inappropriate directions.  Ambiguity and change in requirements, as well as uncertainty in design approaches, can be rapidly explored through working software systems, not simply models and documents. Necessary modifications can be carried out efficiently and cost-effectively through refactoring before code becomes too ‘baked’ and complex to easily change. This posting, the second in a series addressing the software engineering challenges of big data, explores how the nature of building highly scalable, long-lived big data applications influences iterative and incremental design approaches. Iterative, incremental development approaches are embodied in agile development methods, such as XP and Scrum. While the details of each approach differ, the notion of evolutionary design is at the core of each. Agile software architects eschew large, planned design efforts (also known as Big Design Up Front), in lieu of just enough design to meet deliverable goals for an iteration. Design modifications and improvements occur as each iteration progresses, providing on-going course corrections to the architecture and ensuring that only features that support the current targeted functionality are developed. Martin Fowler provides an excellent description of the pros and cons of this approach. He emphasizes the importance of test-driven development and continuous integration as key practices that make evolutionary design feasible. In a similar vein, at the SEI we are developing an architecture-focused approach that can lead to more informed system design decisions that balance short-term needs with long-term quality.Evolutionary, emergent design encourages lean solutions and avoids over-engineered features and software architectures. This design approach limits time spent on tasks such as updating lengthy design documentation. The aim is to deliver, in as streamlined a manner as possible, a system that meets its requirements. There is, of course, an underlying assumption that must hold for evolutionary design to be effective: change is cheap. Changes that are fast to make can easily be accommodated within short development cycles. Not all changes are cheap, however. Cyber-physical systems, where hardware-software interfaces are dominant, offer prominent examples of systems in which hardware modifications or unanticipated deployment environments can lead to changes with long development cycles. Other types of change can be expensive in purely software systems, as well. For example, poorly documented, tightly coupled, legacy code can rarely be successfully replaced in a single iteration. Incorporating new, third-party components or subsystems can involve lengthy evaluation, prototyping, and development cycles, especially when negotiations with vendors are involved. Likewise, architectural changes—for example, moving from a master-slave to a peer-to-peer deployment architecture to improve scalability—regularly require a fundamental and widespread re-design and refactoring that must be spread judiciously over several development iterations. Evolutionary Design and Big Data Applications As we described in a previous blog post, our research focuses on addressing the challenges of building highly scalable big data systems. In these systems, the requirements for extreme scalability, performance, and availability introduce complexities that require new design approaches from the software engineering community. Big data solutions must adopt highly distributed architectures with data collections that are (1) partitioned over many nodes in clusters to enhance scalability and (2) replicated to increase availability in the face of hardware and network failures. NoSQL distributed database architectures provide many of the capabilities that make scalability feasible at acceptable costs. They also introduce inherent complexities that force applications to perform the following tasks: explicitly handle data consistency tolerate a wide range of hardware and software faults track component monitoring and performance measurement so that operators have visibility into the behavior of the deployment Due to the size of their deployment footprint, big data applications are often deployed on virtualized, cloud platforms. Clouds platforms are many and varied in their nature, but generally offer a set of services for application configuration, deployment, security, data management, monitoring, and billing for use of processor, disk, and network resources. A number of cloud platforms from service providers, such as Amazon Web Services and Heroku, as well as open-source systems, such as OpenStack and Eucalyptus, are available for deploying big data applications in the cloud. In this context of big data applications deployed on cloud platforms, it’s interesting to examine the notion of evolutionary system design in an iterative and incremental development project. Recall that evolutionary design is effective as long as change is cheap. Hence, are there elements of big data applications where change is unlikely to be a straightforward task, and that might, in turn, require major rework and perhaps even fundamental architecture changes for an application? We posit that there are two main areas in big data applications where change is likely so expensive and complex that it warrants a judicious  upfront architecture design effort. These two areas revolve around changes to data management and cloud deployment technologies: Data Management Technologies. For many years, relational database technologies dominated data management systems. With a standard data model and query language, competitive relational database technologies share many traits, which makes moving to another platform or introducing another database into an application relatively straightforward. In the last five years, NoSQL databases have emerged as foundational building blocks for big data applications. This diverse collection of NoSQL technologies eschews standardized data models and query languages. Each technology employs radically different distributed data management mechanisms to build highly scalable, available systems. With different data models, proprietary application programming interfaces (APIs) and totally different runtime characteristics, any transition from one NoSQL database to another will likely have fundamental and widespread impacts on any code base. Cloud Deployments. Cloud platforms come in many shapes and sizes. Public cloud services provide hosting infrastructures for virtualized applications and offer sophisticated software and hardware platforms that support pay-as-you-use cost models. Private cloud platforms enable organizations to create clouds behind their corporate firewalls. Again, private clouds offer sophisticated mechanisms for hosting virtualized applications on clusters managed by the development organization. Like NoSQL databases, little commonality exists between various public and private cloud offerings, making a migration across platforms a daunting proposition with pervasive implications for application architectures. In fact, a whole genre of dedicated cloud migration technologies, including Yuruware and Racemi, is emerging to address this problem. Where opportunities for new tools such as these exist, the problem they are addressing is likely not something that can be readily accommodated in an evolutionary design approach. LEAP(4BD) Our Lightweight Evaluation and Architecture Prototyping for Big Data (LEAP4BD) method reduces the risks of needing to migrate to a new database management system by ensuring a thorough evaluation of the solution space is carried out in the minimum of time and with minimum effort. LEAP(4BD) provides a systematic approach for a project to select a NoSQL database that can satisfy its requirements. This approach is amenable to iterative and incremental design approaches, because it can be phased across one or more increments to suit the project’s development tempo. A key feature of LEAP(4BD) is its NoSQL database feature evaluation criteria. This ready-made set of criteria significantly speeds up a NoSQL database evaluation and acquisition effort. To this end, we have categorized the major characteristics of data management technologies based upon the following areas: Query Language—characterizes the API and specific data manipulation features supported by a NoSQL database Data Model—categorizes core data organization principles provided by a NoSQL database Data Distribution—analyzes the software architecture and mechanisms that are used by a NoSQL database to distribute data Data Replication—determines how a NoSQL database facilitates reliable, high performance data replication Consistency—categorizes the consistency model(s) that a NoSQL database offers Scalability—captures the core architecture and mechanisms that support scaling a big data application in terms of both data and request load increases Performance—assesses mechanisms used to provide high-performance data access Availability—determines mechanisms that a NoSQL database uses to provide high availability in the face of hardware and software failures Modifiability—questions whether an application data model be easily evolved and how that evolution impacts clients Administration and Management—categorizes and describe the tools provided by a NoSQL database to support system administration, monitoring and management Within each of these categories, we have detailed evaluation criteria that can be used to differentiate big data technologies. For example, here’s an extract from the Data Model evaluation criteria: Data Model style a.    Relationalb.    Key-Valuec.    Documentd.    Columne.    Graphf.    XMLg.    Object Data item identification    a.    Key-value for each fieldb.    Objects in same store can have variable formats types storedc.    Opaque data items  that needs application interpretationd.    Fixed or variable schemae.    Embedded hierarchical data items supported (e.g. sub documents) Data Item keya.    Automatically allocatedb.    Composite keys supportedc.    Secondary indexes supportedd.    Querying on non-key metadata supported Query Styles a.    Query by keyb.    Query by partial keyc.    Query by non-key values        i.    Indexed       ii.    Non-indexedd.    Text search in data items       i.    Indexed       ii.    Non-indexed In LEAP(4BD), we first work with the project team to identify features pertinent to the system under development. These features help identify a specific set of technologies that will best support the system. From there, we weight individual features according to system requirements and evaluate each candidate technology against these features. LEAP(4BD) is supported by a knowledge base that stores the results of our evaluations and comparisons of different NoSQL databases. We have pre-populated the LEAP(4BD) knowledge base with evaluations of specific technologies (e.g., MongoDB, Cassandra, and Riak) with which we have extensive experience. Each evaluation of a new technology adds to this knowledge base, making evaluations more streamlined as the knowledge base grows. Overall, this approach provides a systematic, quantitative, and highly transparent approach that quickly provides a ranking of the various candidate technologies according to project requirements. As we have demonstrated thus far in this series, there are many facets to LEAP(4BD). The next post in this series on Big Data will explain the prototyping phase. In the meantime, we’re keen to hear from developers and architects who are evaluating big data technologies, so please feel free to share your thoughts in the comments section below. Additional Resources To listen to the podcast, An Approach to Managing the Software Engineering Challenges of Big Data, please visithttp://url.sei.cmu.edu/iq
SEI   .   Blog   .   <span class='date ' tip=''><i class='icon-time'></i>&nbsp;Jul 27, 2015 02:17pm</span>
By Rick Kazman Senior Member of the Technical Staff Software Solutions Division The process of designing and analyzing software architectures is complex. Architectural design is a minimally constrained search through a vast multi-dimensional space of possibilities. The end result is that architects are seldom confident that they have done the job optimally, or even satisfactorily. Over the past two decades, practitioners and researchers have used architectural patterns to expedite sound software design. Architectural patterns are prepackaged chunks of design that provide proven structural solutions for achieving particular software system quality attributes, such as scalability or modifiability. While use of patterns has simplified the architectural design process somewhat, key challenges remain. This blog explores these challenges and our solutions for achieving system security qualities through use of patterns. One common approach for realizing system security is through authentication and authorization. The question for the architect is "How do I design them into the software?  What do I have to build in to achieve them?" To resolve this dilemma, the developer might choose to use a pattern. By way of analogy, a building architect might decide to make a truss to support a roof and so choose one of the standard available designs for trusses. The architect is starting with a pre-packaged piece of design. A pattern in software architecture is a piece of design, just like a truss in a physical building. It is commonplace, however, for developers to implement architectural patterns incorrectly. Due to such factors as ambiguity in specifications or miscommunication during the development process, the transition from design to code often does not go smoothly. Translating patterns into code involves the risk for error that is inherent in any translation process. Moreover, there are interpretation issues regarding how any pattern should be implemented; different programmers often implement the same pattern differently. Patterns also degrade over time, due to changes made during maintenance. Programmers often insert seemingly innocuous changes to the code without realizing that they’re actually undermining the intent of the pattern. For example, let’s say that you’re designing a system where A should never call B directly, for various reasons, e.g., because you don’t want them too tightly coupled or because they’re in different security domains. To resolve this you insert C, an intermediary (sometimes called the Mediator pattern). But later a programmer comes along and looks at the code and thinks, "Well it looks to me like A can just call B," and changes the code, without understanding the design intent. The programmer can’t see the grand plan from looking at the minutiae. In this way, even good designs typically are undermined—they erode and degrade over time. For some system qualities, such as modifiability, this is bothersome. For security, it is potentially fatal. I along with my colleagues—Jungwoo Ryoo of Pennsylvania State University and Amnon Eden of the University of Essex, U.K—developed an approach to address these challenges. Our approach comprises two major components: the design guide and the Two-Tier Programming Toolkit (TTP), described below. In the design guide, we first provide natural language descriptions of approaches for achieving design intent. If your purpose is to achieve security, for example, your design intent might involve detecting, resisting, or recovering from attacks on your system. To help you realize your intent, the guide presents a hierarchy of tactics for building in security. Tactics form the building blocks of patterns. To continue a previous analogy, if a pattern is a roof truss, a tactic is a component of that truss: a chord, web, or plate. Tactics involve a more fine-grained level of design than do patterns.   Hierarchy of Security Tactics Presented in the Design Guide In the figure above, you’ll notice that Limit Exposure is categorized as a tactic for resisting attack. The Limit Exposure tactic minimizes the "attack surface" of a system; it typically is realized by having the least possible number of access points for resources, data, or services and by reducing the number of connectors that may provide unanticipated exposure. This tactic focuses on reducing the probability of and minimizing the effects of damage caused by a hostile action. The Limit Exposure tactic is associated with patterns that build in security, such as the Check Point pattern. This pattern monitors incoming messages, to determine whether they are consistent with a given security policy, and triggers countermeasures if necessary. After you have chosen the pattern and its associated tactics for achieving your design intent, these must be converted into code. As mentioned above, whenever you undertake such a translation step, you have the possibility of producing an inaccurate translation. To solve this challenge, our approach requires that you express your chosen pattern by creating (or, even better, reusing) code charts. Code charts are formal specifications that model and visually depict a program’s structure and organization.  The figure below shows the Check Point pattern as expressed through a code chart. The Check Point Pattern as Represented by a Code Chart You create the code charts through the Two-Tier Programming Toolkit, the second element of our approach. Two-tier programming integrates the representations of both design and implementation. Frequently, when a system undergoes redesign, the changes between the design and the code that implements it may not be coordinated, causing the design to deteriorate. To maintain design quality, programmers must keep these two tiers coordinated through the software’s lifecycle. Our TTP toolkit is a round-trip engineering tool; it maintains design and implementation as separate representations, while facilitating the propagation of changes between them.  It supports round-trip engineering in that it has facilities for forward engineering (planning a new design and ensuring that this design is faithfully implemented by the code) and reverse engineering (determining what structures already exist in the code, and modeling those). The TTP Toolkit was created by a team of colleagues at the University of Essex whom I joined in developing this tool. The program visualizations are reverse engineered from Java source code, and visual specification of object-oriented design is represented in a visual language (based on the formal specification language, LePUS3). When you have created the code charts via the TTP toolkit, you will need to associate each pattern with the source code that implements it. To do this you simply need to associate (or bind) each variable in the pattern to a specific programming language construct. For example, the Check Point pattern makes use of a security policy, represented as "Policy" in the code chart.  The Check Point monitors if messages are consistent with the Policy; if they are not, a counter-measure, represented as "Countermeasure" in the code chart, may be triggered. Variables and Constants of the Check Point Pattern Finally, the TTP Toolkit can check that your design is implemented correctly. The toolkit automates the verification process so that you can easily determine whether the code conforms to a tactic or pattern specification. The tool lets you recheck the design at the click of a button any time you make changes to the code, to assure conformance as the system evolves. This verification mitigates the problem of programmers’ unwittingly undermining patterns during maintenance. Successful Verification through the TTP Toolkit Round-trip engineering tools usually support reverse engineering by generating visualizations from analyzed source code. If the output of the reverse engineering tool (in this case, code charts) can feed directly into the forward engineering tool, and vice versa, then the engineering cycle can be said to be closed—hence round-trip engineering. Closing the engineering cycle can ensure that anytime a gap between design and implementation is generated, it can quickly be detected and remedied.In summary, our approach involves these steps for achieving system security: Identify your security requirements to determine your design intent. Using the design guide, find security tactics to compose security patterns, creating a library. Through the TTP toolkit, convert this library of security patterns to code charts. Use the TTP toolkit to automatically ensure conformance of implementations to design specifications. Wrapping Up and Looking Ahead The design guide provides a natural language description of the architectural approach, guiding the architect from a statement of design intents to one or more tactics, which in turn lead to one or more patterns. We have already specified a large number of security patterns in the guide and the toolkit, and are adding more.  While many patterns catalogs have emerged over the past two decades, what sets this work apart is that the toolkit adds rigor to the specification of a pattern and adds conformance checking to ensure that the design and implementation always stay synchronized.  This synchrony is essential. If developers don’t maintain this coordination, then the software’s design will certainly degrade over time, a condition referred to as "software entropy." Every designer and project manager should be aware of this risk and concerned about reducing it. It’s useful, by the way, to note that the TTP toolkit has limitations and cannot check every property. For example, it can’t tell you that A is going to happen after B, or is going to happen immediately after B, because those are temporal specifications, involving timing and ordering. The toolkit can only confirm structural properties, such as A calls B or A inherits from B. In other words, it can check static properties and relationships as opposed to dynamic properties and relationships. We hope that our approach will enable practitioners to improve their design process and the software it produces. As with the introduction of any new technology, we expect to encounter some resistance to learning a new tool and a new formalism and concern about adoption risks. We also want practitioners to realize that this technology won’t solve all of their design problems. It will solve some important ones, however. If successful, the design guide and TTP toolkit will make it easier to maintain conformance and reduce the cost of doing so. Adopting our approach should result in higher quality design and implementation, in turn resulting in more secure implementations of designs.  Please let us know your thoughts about the utility of this techniques in the comments section below. Additional Resources To read the book Software Architecture in Practice, which I co-authored with Len Bass and Paul Clements, please visit http://resources.sei.cmu.edu/library/asset-view.cfm?assetid=30264 To read the paper Using Security Patterns to Model and Analyze Security Requirements by Sascha Konrad, Betty H.C.Cheng, Laura A. Campbell, and Ronald Wassermann, please visithttp://www.cse.msu.edu/~cse870/Materials/rhas-03-CRC.pdf To read the paper  A Pattern Language for Security Models by Eduardo B. Fernandez and Rouyi Pan, please visitwww-engr.sjsu.edu/fayad/current.courses/cmpe133-fall04/docs/PLoP2001_ebfernandezandrpan0_1.pdf To read the paper In Search of Architectural Patters for Software Security by Jungwoo Ryoo, Phil Laplante, and Rick Kazman please visit http://dl.acm.org/citation.cfm?id=1591991To read the paper Security Engineering with Patterns by Markus Schumacher and Utz Roedig, please visit http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.1662M. To read the paper, Architectural Patterns for Enabling Application Security by Joseph Yoder and Jeffrey Baraclo, please visit  http://www.idi.ntnu.no/emner/tdt4237/2007/yoder.pdf
SEI   .   Blog   .   <span class='date ' tip=''><i class='icon-time'></i>&nbsp;Jul 27, 2015 02:17pm</span>
By Will Casey Senior Member of the Technical StaffCERT Division Code clones are implementation patterns transferred from program to program via copy mechanisms including cut-and-paste, copy-and-paste, and code-reuse.  As a software engineering practice there has been significant debate about the value of code cloning. In its most basic form, code cloning may involve a codelet (snippets of code) that undergoes various forms of evolution, such as slight modification in response to problems.  Software reuse quickens the production cycle for augmented functions and data structures. So, if a programmer copies a codelet from one file into another with slight augmentations, a new clone has been created stemming from a founder codelet.  Events like these constitute the provenance or historical record of all events affecting a codelet object. This blog posting describes exploratory research that aims to understand the evolution of source and machine code and, eventually, create a model that can recover relationships between codes, files, or executable formats where the provenance is not known. Other major events for codelets may include creation, copy, modification, and deletion of a codelet from a project.  The presence of software bugs and vulnerabilities brings a new set of questions for which the provenance of codelet is particularly relevant.  Consider the copy/paste of a codelet that contains a bug. By cloning the programmer has essentially introduced a new bug by copying an old bug. The tracing of provenance in software developments has therefore become increasingly important to tracking both bugs and vulnerabilities.  A generative model of provenance developed from observed software histories may inform researchers and developers how code evolves and mutates over time.  Taking that notion one step further, a generative model that uses  Bayesian Statistical techniques provides inference possibilities, such as identifying relations in software where the provenance is not completely known. Examples of problems where provenance models and inference may be critical include identifying attribution for malware and analyzing potential vulnerabilities or vulnerability surface in third-party software product chains. These are two increasingly important problems in today’s era of cyber warfare where we aim to both minimize vulnerability exposure and understand adversarial attack patterns.  Further provenance inference may be an important step in understanding the dynamics of cyber conflict:  If an organization deploys sophisticated cyber attacks can counter attacks (or more generally other attacks) occur by simply copying the malicious code and re-deploying it?  Codelet mobility (the ease at which an implementation can be copied and redeployed) is an important factor to understanding the strategic options of agents in cyber conflicts. Initial Research This work began about 18 months ago when CERT researchers began using automated techniques to identify relationships (i.e., common authorship) among different types of malware. In the not-too-distant past, identifying the provenance or origin of malware required reverse engineering, which was labor intensive and time consuming. I worked on that project with Aaron Shelmire, a former CERT researcher, to develop a Jaccard similarity index, which measures the amount of commonality between any two digital artifacts, source code files, or binary executables, based on the number of code clones in common. The Jaccard index identifies all common substrings in the artifacts and presents a measure of how much code is code in common. Not only is this technique a more efficient approach than reverse engineering, we have arrived at techniques that are algorithmically optimal, and shown that these methods may be used to prioritize the efforts of reverse engineers. We used the Jaccard index to systematically examine subsets of artifacts to identify malware families, which share a high degree of code similarity in binary images or epoch similarity in traces (runtime). The basic notions of similarity are sufficiently long interesting strings or clusters of shorter strings.  Because there are many generative factors giving rise to code clones (e.g. compiler artifacts, linked static libraries, copy-cut-and-past) the question arose, How can we interpret these measures in to infer shared provenance and how may we do so with statistical confidence? From a malware attribution perspective this question reduces to How can we demonstrate that a high degree of similarity between two malware families is due to a common attacker or common source code? How could an analyst be certain that the similarities were not the result of shared, online, open-source code? This simple question sparked our research question: How should we measure provenance in software? This exploratory research involves several phases: Phase 1 involves leveraging a large-scale software history to gain a better understanding of how code evolves. Phase 2 involves creating a generative model that will allow us to investigate the statistical problem of inference.  Phase 3 involves the application of the statistical model to problems of inference This research provides malware analysts greater certainty in determining shared authorship and other cloned features among two or more file A Foundation in Phylogenetics Our interest in inference models for software provenance are preceded by effective inference models in the biological problems of phylogenetics, which allowed biologists to infer the structure of evolution events and determine common ancestry by examination of contemporary biological data.  In fact, if you have received a flu vaccine then you have benefited from this science inference and prediction. We recognized similarities to phylogenetics in our own work: We were researching malware artifacts in hopes of understanding more about their historical developments from the artifacts alone (with the provenance data withheld). To address this completely, we would need to understand and model the events that modify and edit source code. To accomplish this, we needed to look at source code with a known provenance. Turning to Open-Source Code It is an opportune time to explore the history of software for any researcher interested in examining the evolution of source code. Unix maintains a 43-year history detailing the evolution of its code while Linux maintains 21-year history. We decided to examine Mozilla’s 14 year open-source history. The previous five years contain many significant developments indicated by the following statistic: 2,546 users modifying 25,000 files containing a total of 122 million lines of source code modified.  Of the 122 million lines of code (LOC), only 4.2 million LOC are unique. This statistic suggests that a vast majority of lines are redundant, many of which are copied or cloned.  If foundational events may be understood and subsequent mutations identified within a generative provenance model, it is possible that among the 4.2 LOC, there may be still even greater reductions that arise from understanding the mutation structure and co-migration patterns of source code.  While an entire list of driving factors leading to code mutation and migration may not be practical, major modification patterns can be inferred and dynamic modification patterns can be learned from data with know provenance. Addressing Size and Scale One of the major challenges in our research involves addressing the size and scale of the Mozilla source-code distribution, which in the last five years has logged 120,355 major commit events. One method to address this size and scale is through indexing techniques. For example, in January 2013 Mozilla distributed 17 million LOC (only 20 percent of which are unique) and distributed in 15 thousand files. Indexing these individual lines of code as well as their neighboring lines of code have has allowed us to identify portions of code that have migrated from one file to another during development events. While there are large numbers of file objects and LOC objects, the relation between file objects and LOC objects is very sparse. Indexing also allowed us to map relations such as co-occurrence clusters of LOC and their containing files. Using these observations we may observe statistical trends of modifications made to files over time and test our assumptions of sparse structures and evolution patterns such as the following scenario: files exhibit rapid growth (initially) with lots of code added in, and then settle down eventually with bug fixes generating the only modifications there after. By creating an index of unique lines of code, we were able to map codelets as code patches with co-migratory patterns in files. A challenging aspect of modeling provenance problems for software has been the notion of object granularity [Bunge and Ram, Liu] or determining what objects should be modeled.  Because source code objects are so closely tied with their application or function, both the file (as object) or even the line of code (as object) present drawbacks. We theorize that the most useful object to consider are functional in nature, such as a well-formed function definition in the C programming language, which may also be considered a codelet (or a set of lines-of-code). Toward testing this viewpoint our indexing techniques allow us to observe co-migration of codelets or sets of lines forming well defined C language functions. Another challenge in our research involved interpreting the summary of actions provided by developers. One way to interpret this summary is to create topic models that focus on key terms, such as bug, merge, adding code, etc.   By creating topic vectors, we enhanced the inference of why certain modifications were made. Another obstacle that we needed to address in our research involved accounting for the many mechanisms that affect the copy number of lines of code. Addressing this challenge involved understanding how to model deletion events in a distributed revision control system, these events unlike their analogous events affecting the provenance of physical objects, are harder to deal with theoretically as remote branches when merged may re-introduce lines of code that were previously deleted.  Collaboration In examining Mozilla’s source code, we have collaborated with Sudha Ram, a researcher at the University of Arizona who is an expert in data provenance. Her research along with Jun Liu has examined provenance in Wikipedia pages, iPlant Collaborative, and in the Materials Management domain. Their work on digital provenance in edit histories of Wikipedia pages has helped identify collaboration patterns that result in high quality pages. Ram and Liu have developed a model of semantics for data provenance known as W7, a technique that describes the who, what, when, where, why, which, and how, for data events. We are also working with Rhiannon Weaver and Katherine Prevost, fellow CERT researchers, who are helping us design statistical measures of ownership and topic models for summaries associated to modification events. Future Work  With our indexing approach, we have been able to establish measures of commonality between malware files. We are aiming to use these measures to infer the likelihood for several different types of code clone events: subtraction - modifications that remove lines of code. addition - modifications that introduce additional lines of code to enhance a function. mutation - modifications of code by slight in situ modifications to fix/enhance function.  moving/copying - reorganization or re-introduction of codelets in files. Our real goal, however, is to understand the mobility of code and how it mutates and flows from one source to another. For practical problems attaining this understanding will allow us to assess the likelihood of why certain codes are found in common.  For malware, if we can confirm that 20 percent of a malware group is related to a previously identified malware group, can we conclude with confidence that what other factors would lead to such a result? We plan to continue our collaboration with Sudha Ram with the intent of producing a provenance model for software code. Specifically, this model will describe the evolution mechanism for code clones. After training and evaluating the model, we will use statistical analysis to identify clone evolution patterns where inference may be applied to determine provenance among two different pieces of malware. Specific evolution patterns that admit model validation can then be applied to operational problems of malware attribution. In this next phase of our research, we are also collaborating with two researchers in the SEI’s Emerging Technology Center, Naomi Anderson, and Kate Ambrose-Sereno, who are examining data provenance models for software. We welcome your feedback on our research. Please leave a comment in the feedback section below. Additional Resources To read the book Treatise on basic philosophy: ontology I. The furniture of the world by Mario Bunge, please visit http://www.springer.com/philosophy/epistemology+and+philosophy+of+science/book/978-90-277-0785-7 To read the article, Who does what: Collaboration patterns in the wikipedia and their impact on article quality by Jun Liu and Sudha Ram, please visithttp://dl.acm.org/citation.cfm?doid=1985347.1985352 To read the article, A Semantic Foundation for Provenance Management by Jun Liu and Sudha Ram, please visithttp://link.springer.com/article/10.1007%2Fs13740-012-0002-0#page-1
SEI   .   Blog   .   <span class='date ' tip=''><i class='icon-time'></i>&nbsp;Jul 27, 2015 02:17pm</span>
By Lisa Brownsword,Senior Members of the Technical Staff Although software is increasingly important to the success of government programs, there is often little consideration given to its impact on early key program decisions. The Carnegie Mellon University Software Engineering Institute (SEI) is conducting a multi-phase research initiative aimed at answering the question: is the probability of a program’s success improved through deliberately producing a program acquisition strategy and software architecture that are mutually constrained and aligned? Moreover, can we develop a method that helps government program offices produce such alignment? This blog post, the third in a series on this multi-year research, describes our approach to determining how acquisition quality attributes can be expressed and used to facilitate alignment among the software architecture and acquisition strategy. In the first post in this series, we identified specific instances where misalignment between software architecture and acquisition strategy resulted in program delays and cost overruns—and, in some cases, program cancelation. In the second post, we outlined seven patterns of failing behavior, or "anti-patterns," that were major contributors to this misalignment. In addition, we presented a model that links key program entities and their relationships that could avoid those anti-patterns. Finally, we observed that alignment among software architecture and acquisition strategy does not occur naturally. Of particular note in this early research, we postulated that acquisition quality attributes reflecting the program’s business goals can be used to judge the effectiveness of an acquisition strategy—analogous to software quality attributes reflecting the mission goals that are used to judge the effectiveness of a software architecture. In this latest effort, a team of researchers that, in addition to myself includes Cecilia Albert, Patrick Place, and David Carney, have focused on determining how to express, elicit, and analyze acquisition quality attributes for their utility in surfacing potential incompatibilities between a software architecture and an acquisition strategy. Our Research Approach We patterned our approach to defining acquisition quality attributes on the original SEI research on software quality attributes. We began by forming a starting starter set of potential acquisition quality attributes. We compiled an unordered list of roughly 30 qualities derived from a review of government acquisition strategy guidance and discussion with acquisition professionals, colleagues, and several brainstorming sessions within our team. Examples of these initial acquisition quality attributes are criteria by which acquisition strategies are judged affordability achievability effectiveness flexibility Next, we defined an approach for expressing program-specific acquisition quality attribute scenarios using the SEI’s earlier work in software architecture where stakeholders are encouraged to create small "stories" that specify some event (the stimulus) that occurs in a particular part of the lifecycle (the environment) and a desired behavior (the response). For example, in software architecture, a quality attribute might be expressed using the following three-part scenario: stimulus - an internal component fails environment - during normal operation response - the system recognizes an internal component failure and has strategies to compensate for the fault In the acquisition domain, a similar example might be as follows: stimulus - an unexpected budget cut environment - for a multi-segment system response - the program is able to move work between major segments to speed up or slow down separate segments within the available funding Next, we elicited acquisition quality attribute scenarios from former program management office personnel and members of Independent Technical Assessment teams (informally called Red Teams) to capture 55 acquisition quality attribute scenarios covering 23 government programs. We needed these scenarios to validate that qualities related to an acquisition strategy can be expressed in a meaningful way through these scenarios. In addition, we were able to show that a tight link exists between the acquisition quality attribute scenario and some element of the of the acquisition strategy. These scenarios are detailed in a technical report, Results in Relating Quality Attributes to Acquisition Strategies. The following is an example of the scenarios we constructed: stimulus - a new need arises when we want to react quickly environment - where there are only a limited number of contractors able to do the work response - the program can satisfy the need by adding the work to an existing contract potential acquisition tactic - Award IDIQ contracts to multiple (perhaps eight or so) vendors and issue task orders in a round-robin fashion We built and validated a prototype workshop adapted from the software Quality Attribute Workshop (QAW). The major changes we made to a conventional QAW were to place more emphasis on business presentation and replace the architecture presentation with one on the program’s acquisition strategy plans, thus we termed our revised approach the Acquisition Quality Attribute Workshop (AQAW). We then prototyped our AQAW using a real program but substituting SEI staff for various program stakeholders. The prototype AQAW successfully generated twenty acquisition quality attribute scenarios. While only a single instance, the prototype successfully demonstrated that an AQAW is a plausible approach for capturing acquisition quality attribute scenarios. Finally, we analyzed all captured acquisition quality attribute scenarios for trends, implications, and potential incompatibilities. Some of our findings are described below.Different scenarios result in different acquisition strategies. For an acquisition quality attribute scenario to influence the acquisition strategy, there must be some element of the scenario that leads the program office to choose a strategy. For instance, we found a number of acquisition quality attribute scenarios relating to new technology and the issues that arise if the chosen innovative technology fails to deliver on its promises: A new technology the program office expects to use is found to be unsuitable where schedule is of prime importance; the program office switches to an alternative that is also currently under development and is evaluated to be suitable. A new technology the program office expects to use is found to be unsuitable where costs must be kept as low as possible; the program office instructs the contractor to restart but using an alternative technology. In the case of these scenarios, the stimulus is the same but the environment changes. In the first scenario, schedule is more important than cost. The second scenario reverses their relative importance. In the first scenario, an acquisition strategy starting multiple developments simultaneously with a requirement for some kind of decision between the alternatives would be appropriate. In the second scenario, a strategy starting a single development contract and continuing to use it while switching to a more feasible technology might be more appropriate. Although this is a simple example, it demonstrates that different acquisition quality attribute scenarios can lead to different acquisition strategies. This finding strengthens our contention that our use of acquisition quality attributes and acquisition quality attribute scenarios is analogous to the use of software quality attribute and software quality attribute scenarios and that we may continue to rely on methods and mechanisms developed for that purpose to assist with the creation of sound acquisition strategies. Incompatible Acquisition Scenarios. Conflicts between scenarios are not always obvious, and may be quite subtle without some analysis.  For example, organization ABC is using a large, complex legacy system deployed in multiple operational locations, where each location installed its own local variant of the system. Over time, these variants have diverged in response to differing requirements of the local users. To accommodate mission changes, it has become important to share data across these locations in a more integrated way. A new program was initiated to acquire one replacement capability that would support all of the differing needs across the multiple fielded locations. The program decided to implement an incremental approach to replacing the legacy system so they could respond to budgetary constraints and uncertainties. For the example described above, the following scenario reflects the expectation of one set of influential stakeholders who advocated the use of a commercial off-the-shelf (COTS) product that had been successfully used at one of the operational installations: stimulus - There is a desire to replace a complex component of a large legacy system with a COTS package environment - Within an established enterprise architecture with many local variations implemented that are largely different from each other response - The program runs a competition to evaluate COTS packages for an enterprise-wide solution. A second set of stakeholders, reflecting operational users, is counting on the new system to quickly address their current needs. These needs vary among the current fielded locations. During the time it takes the program to define an agreed upon set of requirements for each increment, the user representatives from the various fielded location change their requirements. This leads to a second acquisition scenario: stimulus - Requirements for the next release keep changing environment - For a program with a fixed budget that must be carefully managed response - The program accepts the new requirements Implied in these two scenarios is a third set of stakeholders: the enterprise system engineers, who are advocating the implementation of an enterprise architecture that extends across all of the local fielded implementations. This enterprise architecture could be incompatible with both of the above scenarios: each COTS product, by definition, is built to an architecture and a set of requirements that organization ABC has no control over. Moreover, the demands for local fielded implementations compete with architectural changes within a constrained budget. Unfortunately, the two scenarios are potentially incompatible with respect to designing the acquisition strategy. The first scenario describing the implementation of a common COTS product across all locations could provide sizeable value in terms of moving to one capability that is used across all fielded locations, but it may not meet what the current users described in the second scenario consider urgent needs—and both of these may conflict with the move to an enterprise architecture. Looking Ahead We were pleased to demonstrate a critical link between acquisition quality attributes and an acquisition strategy. We also identified potential recommendations for how an acquisition quality attributes could be expressed in program specific three-part scenarios. In the next phase of this work—already under way—we are working to develop an alignment method that collects and analyzes acquisition and software quality attributes in a way that enables deconfliction and prioritization. We believe that acquisition strategies and software architectures built from this consistent set of quality attributes would be aligned and mutually constraining.   Additional Resources To read the SEI technical note, Results in Relating Quality Attributes to Acquisition Strategies, please visit,http://resources.sei.cmu.edu/library/asset-view.cfm?AssetID=78312 To read the SEI technical report, Quality Attribute Workshops, please visithttp://www.sei.cmu.edu/library/abstracts/reports/03tr016.cfm To read the SEI technical report, Isolating Patterns of Failure in Department of Defense Acquisition, please visit http://resources.sei.cmu.edu/library/asset-view.cfm?assetid=53252
SEI   .   Blog   .   <span class='date ' tip=''><i class='icon-time'></i>&nbsp;Jul 27, 2015 02:16pm</span>
By Kate Ambrose SerenoTechnical AnalystSEI Emerging Technology CenterThis post was co-authored by Naomi AndersonIn 2012, the White House released its federal digital strategy. What’s noteworthy about this release is that the executive office distributed the strategy using Bootstrap, an open source software (OSS) tool developed by Twitter and made freely available to the public via the code hosting site GitHub. This is not the only evidence that we have seen of increased government interest in OSS adoption. Indeed, the 2013 report The Future of Open Source Software revealed that 34 percent of its respondents were government entities using OSS products. The Carnegie Mellon University Software Engineering Institute (SEI) has seen increased interest and adoption of OSS products across the federal government, including the Department of Defense (DoD), the intelligence community (IC), and the Department of Homeland Security. The catalyst for this increase has been innovators in government seeking creative solutions to rapidly field urgently needed technologies. While the rise of OSS adoption signals a new approach for government acquirers, it is not without risks that that must be acknowledged and addressed, particularly given current certification and accreditation (C&A) techniques. This blog post will discuss research aimed at developing adoptable, evidence-based, data-driven approaches to evaluating (open source) software. In this research, members of the technical staff in the SEI’s Emerging Technology Center (ETC) explored the availability of data associated with OSS projects and developed semi-automated mechanisms to extract the values of pre-defined attributes.  The challenges of applying data analytics to address real problems and of understanding OSS assurance align with the ETC’s mission, which is to promote government awareness and knowledge of emerging technologies and their application, as well as to shape and leverage academic and industrial research. Our research leveraged the "openness" of OSS to develop an evidence-based approach for assessing and assuring OSS. This approach, which focused on producing evidence in support of assurance claims, is based on generating artifacts and creating traceability links from assurance claims to those artifacts. Beyond a Trust-Based Approach If we think of traditional, "shrink-wrapped" software, we accept that the software is developed by and purchased from a vendor who delivers a product against specified requirements.  The software comes with installation instructions, FAQs, and access to support via hotlines and websites. Generally speaking, there is a company or some kind of legal entity that stands behind the product. With OSS development, however, multiple developers from different organizations (even independent developers) can contribute to the code base of a product, which may or may not be backed by a single legal entity. In some cases, developers include helpful information in the software repository; in other cases, users are on their own to get the software working in their environment. Specific functionality may be driven by the community of developers, or by a small core team. Current methods to assess software (OSS or otherwise) are trust-based and rely heavily on expert opinion.  For example, users may run experiments with the software in a controlled environment to determine whether or not it is safe to operate. When certifying and accrediting OSS or any software, however, the trust-based model is not valid for several reasons: In today’s environment, many organizations and entities incorporate some aspect of open-source into their software. As a result, no single company or organization represents an OSS capability. Individual expert assessments are manual and do not scale to the level required for large-scale, mission-critical projects that apply OSS. Assurance claims are based on opinion rather than on a data-driven designation of assurance. For these reasons, we wanted to develop a prototype of a tool that reaches beyond the functions of traditional static analysis tools. Our aim is to create a tool that government or industry could use to support their decision of whether to adopt an OSS package. We felt it was important to develop a tool to provide supporting evidence, rather than one that would provide a determination of whether a particular software package is "good" or "bad." In this age of sequestration and other pressures on the expense of acquiring and sustaining software-reliant systems, government agencies can realize numerous benefits from a good OSS development and adoption strategy, including cost savings and increased flexibility in the acquisition and development of systems. Foundations of Our Approach In 1998, after Netscape published the source code for Netscape Communicator, Bruce Perens and Eric S. Raymond, founded the Open Source Initiative, an organization dedicated to promoting OSS. Since that time, a large number of OSS repositories have surfaced including Github, Launchpad, Sourceforge, and Ohloh. In developing an approach, our team of researchers and software developers at the SEI wanted to create a tool that leveraged features of OSS, including the openness of code, development environment, documentation, and user community. Our aim was to design and develop an integrated, semi-automated software assessment capability that would allow an assessor to explore the evidence supporting an assurance claim. The upside of the renewed interest in OSS adoption, both in government and industry, is that a wealth of data now exists within these repositories that provide insight into development of OSS as well as the code review and code committal process. Our aim with this research was to move beyond simple bug counts and static analysis and provide richer context for those charged with assessing software systems. While no one measure or metric could provide an accurate assessment of software, we reasoned that several characteristics could provide acquirers with a more complete view of OSS assurance. During our study, we identified measurable characteristics that could be of interest, particularly if assessed in combination.  For example, we examined complexities of the coding language used, test completion, and vitality or inertia of the project.  Other characteristics that we evaluated included milestones. Our analysis included proposed release dates versus actual release dates. Meeting clearly described milestones and schedules is often an indicator of sound project management. bugs. We examined issues such as severity, discovery date versus fixed date, timing for a fix to be included in a release, percentage of bugs carried from previous release, distribution of bugs by severity, time-to-fix measures, rate at which new bugs are identified, bugs tracked in current release as well as bug aging and classification. Bug counts and defect density alone are not sufficient. If we look at sluggish time-to-fix measures, however, that may signal problems with the current release. documentation. We looked at whether there was a process to update documentation, whether the documentation was up to date or not, release notes, change log, how lines of code versus the length and completeness of the user manual correlate.  A lack of documentation is a risk to adopters and implementers of the software because the implementers are left to their own devices to get the software working in their environment, which can cause significant delays in rollout.  user base growth over time. We looked at activity levels in mailing lists (users and developers). We also considered activity levels at conferences, market penetration, and third-party support. We reasoned that evidence of increasing or decreasing activity from the user community was evidence of the strength of the product. developer involvement over time. Our evaluation spanned number of commits (making suggested changes from the user community), number of unique authors, lines of contributed code versus total lines of code, evidence of hobbyist developers, and a network diagram illustrating connections and influence of the community of developers, code committers and reviewers. We reviewed the social network of the developer community that supported particular OSS projects. Context is important. Using the data collected to help build an understanding of the development environment, developer activity, and user community commitment helps potential adopters get a better sense of the viability of the OSS project. Challenges When we first began this research, we focused on identifying data that would allow us to make valid comparisons between identifiers of quality in different software repositories. We soon realized, however, that quality attributes really are context dependent. For example, OSS acquirers may place various levels of importance on whether software is updated during daytime hours by full-time employees or during evening hours by hobbyists. Instead of placing a value judgment on these variables, we altered our approach to identify characteristics such as the ones listed above that can be used by decision makers to determine relevancy and weighting. As we progressed through the research, we also realized that OSS repositories were starting to explore ways to represent data relevant to the OSS projects in the repositories.  For example, Github maintains a graphs section that highlights data, such as code stability, trends over time, and a separate punch card section that represents the volume of code commits over the span of a week. Another example involves Ohloh, which provides a side-by-side comparison along different parameters about the OSS projects. Another challenge that we encountered surfaced after we began exploring the OSS repositories. We found that while there are many typical developer tools being used, they were all being used differently across different software projects. One example of this involved JIRA, a bug tracking software that offers users configurable fields. Another example can be found in the Apache Software Foundation project Derby, some bugs have fields for urgency, environment, issues, fix information, or bug behavior facts while others do not.  Looking Ahead All indicators point to increased adoption of OSS. In November 2013, Federal Computer Week published an article detailing the adoption of OSS across the DoD. An article on OSS and government in Information Week earlier that month stated that "Federal agencies, looking for new ways to lower their IT costs, are exploiting open-source software tools in a wider range of applications, not only to reduce software costs, but also to tighten network security, streamline operations, and reduce expenses in vetting applications and services." In the coming year, we will continue our work in data analytics and OSS assurance. We are interested in collaborating with organizations to expand selected data to analyze and find correlations among seemingly disparate dimensions and measures in software development produce evidence for specific OSS projects that are critical to mission needs test-specific assurance claims using data-analytics approach and build-in bi-directional traceability between claim and evidence build tools to accommodate large scale analysis and evidence production (multiple OSS projects along multiple dimensions) experiment with evidence production targeting tools develop and publish a comprehensive open source assurance classification system If you are interested in collaborating with us, please leave a comment below or send an email to info@sei.cmu.edu. Additional Resources For more information about the SEI Emerging Technology Center, please visithttp://www.sei.cmu.edu/about/organization/etc/ To read the article Has Open Source Officially Taken Off at DoD? by Amber Corrin, please visithttp://fcw.com/Articles/2013/11/19/DOD-open-source.aspx?Page=1 To read the article Agencies Widen Open-Source Use by Henry Kenyon, please visithttp://www.informationweek.com/agencies-widen-open-source-use--/d/d-id/899851 To read the article Army C4ISR portal uses open-source software for faster upgrades by William Welsh, please visit http://defensesystems.com/articles/2013/01/30/army-c4isr-portal-open-source-software.aspx
SEI   .   Blog   .   <span class='date ' tip=''><i class='icon-time'></i>&nbsp;Jul 27, 2015 02:16pm</span>
By Anne Connell Design Team Lead CERT Cyber Security Solutions Directorate  This blog post was co-authored by Tim Palko.  According to a report issued by the Government Accountability Office (GAO) in February 2013, the number of cybersecurity incidents reported that could impact "federal and military operations; critical infrastructure; and the confidentiality, integrity, and availability of sensitive government, private sector, and personal information" has increased by 782 percent—from 5,503 in 2006 to 48,562 in 2012. In that report, GAO also stated that while there has been incremental progress in coordinating the federal response to cyber incidents, "challenges remain in sharing information among federal agencies and key private sector entities, including critical infrastructure owners." Progress in this area was hindered by "difficulties in sharing and accessing classified information and the lack of a centralized information-sharing system," the report stated. This blog post describes a tool that members of the CERT Cyber Security Solutions (CS2) Directorate are developing to provide the various agencies and organizations that respond to cyber incidents a platform by which to share information and forge collaborations.   I have witnessed these challenges to effective collaboration first-hand. In my role, I am often called upon to observe subject matter experts who advise incident responders while they manage cyber incidents to assist in collecting evidence and presenting it to authorities working criminal cases. In this role, I have repeatedly observed incident responders, including law enforcement and subject matter experts, operating in disconnected siloes. Representatives from these agencies literally set up separate work stations.  While attackers are organized and well-coordinated in their efforts, agencies and organizations that respond to these cyber incidents operate in disconnected siloes that are in need of a shared platform for trusted collaboration. The aim of our work is to create this platform. We intend the platform to be used across a range of groups, such as computer security incident response teams (CSIRTs), incident responders, commercial companies, and law enforcement.  Our Approach: Cerebro At its core, Cerebro is a prototype that allows collaborators to identify and tag actionable information. The information is collected at a level of granularity that allows collaborators to specify the extent to which they want to share this information: at a group level, an organizational level, or with all participants of an organization.  As outlined in the IEEE paper Cerebro: A Platform for Collaborative Incident Response and Investigation, which I co-authored along with Tim Palko, our approach incorporates a six-phase model that represents the process an incident responder/assessor goes through when responding to a cyber incident:  Site assessment. The primary goal of this phase is to develop a response plan to handle the incident based on its assessment and prior experience. Activities in this phase include assembling and training the team and creating situational awareness of the systems that comprise the site or event.  Site aggregation. Activities in this phase include conducting a network assessment of a site or event and minimizing the scope and impact of the attack. The site assessor also begins site categorization and collection and uses forensic software or toolkits to obtain and extract evidence.  Site analysis. Activities in this phase include identifying lessons learned from the handling of the assessment. A meta-profile containing the reduced critical assessment of the site is imported to the data store to generate rules or steps for creating better preparedness, which may include modifying policy or process or making changes to configurations. Collaborative investigation/correlation. Activities in this phase involve the establishment of a trusted environment that enables communication about the site, which is essential when preparing for a collaborative response. This environment must provide analysts a space to make observations and correlate disparate data types (site security, network analysis, etc.). Applied machine learning is also performed on the dataset. The machine learning produces the predictive analysis that is pushed to the analyst. Policy/rule application. Activities in this phase focus on data analysis, a crucial part of the investigation process. The availability of data from multiple sites and/or events opens the possibility of cross-site analysis to establish links among events occurring at individual sites. During investigation and correlation, the collaborative mechanism runs and shares watch-lists, security events, and rule sets. Reports from individual sites and/or events are collected and sent to Cerebro. After a detailed analysis—which might involve two analysts who have the same observation or the system's autonomously identifying links happening at multiple events—Cerebro generates push notifications to alert the user of the associations. Site incident strategy. Activities in this phase occur once the incident investigation has concluded. The site administrator and site responder (both humans) take appropriate steps to mitigate any risks or bring compromised systems back online. Policy and rules developed automatically in the analysis stage are presented as a critical stage to disseminate information, but ultimately any action taken based on these notifications and rules is taken by a person.  Cerebro takes a practical approach to defining a system model that collects and analyzes data in a trusted cloud-computing platform, which allows us to store large volumes of data while simultaneously processing them to find, store, and categorize evidence of malicious attacks. We will host our tool on an extensible large-scale analysis platform for managing and analyzing data (such as logs and communications). The analysis platform provides better management and a better security mode, and is equipped with a suite of open-source tools for log and data extraction, data and evidence storage, data and log analysis, and forensics.  In a cyber attack, organizations may encounter an adversary who targets communication between administrators to disrupt the effectiveness of their response. In designing a framework to foster collaboration in the wake of a large-scale cyber attack, we envisioned role-based access control that draws upon the principle of least privilege. Cerebro comprises two main components:  a Roles and Responsibilities Model that defines the entities involved in the response and investigation, their responsibilities, and their interactions  a Process Model that defines the phases of the response and investigation process, as well as the execution of responsibilities in these phases Together, these components ensure that the response and investigation team members are able to effectively manage the required tasks. In particular, the system model integrates the technical incident response and the legal investigation and prosecution process in a multi-site collaborative manner.  Our approach builds on several well-known principles for effective collaboration. For trust establishment, we rely on an incentive-based approach in which organizations learn more about vital watch-list information and obtain access to tools and resources to respond to and recover from attacks.  Cerebro also relies on an approach involving organizational access policy; organizations providing value in the identification and response process can collectively define important pieces (IP addresses, type of attack, pattern identification) of an investigation.  For managing tasks and processes, we focus on identifying and indexing areas of interest that warrant collaboration to integrate them into an well-defined process workflow for each organization.  Challenges of Our Approach  One challenge of our approach with Cerebro involves addressing security issues so that potential users can be assured that their information will circulate within an intended audience. To address this issue, we designed a system that uses two-factor authentication: role-based and signature-based.  Ideally, our tool fosters 100 percent participation by all involved. Cerebro observes the 90-9-1 rule. This is the basic observation that in a collaborative platform, such as a wiki 90 percent of participants will "lurk" and simply observe information being posted 9 percent of participants will actively edit and produce the information being created 1 percent of participants will be involved in content validation, administration, and rule generation Our approach hypothesizes that if only 9 percent of participants are involved in information analysis, that group can act on the information and ideally retain enough so it doesn’t compromise their organization. The lurkers will hopefully retain some of the lessons learned by the methodologies employed by the real subject matter experts. The lurkers, however, cannot be relied upon to provide actionable intelligence.  Early Influences and Collaborations  As with any research effort, our work has been influenced by many other researchers including Dr. Eric Nyberg, a professor in the Language Technologies Institute in Carnegie Mellon University's School of Computer Science. Dr. Nyberg’s research in this field helped us gain a greater understanding of machine learning and rule generation.  This research also draws upon theories introduced by Dr. Carolyn Rosé who teaches an applied machine learning vlass in the Human Computer Interaction Institute. Dr. Rose’s research focuses on better understanding the social and pragmatic nature of conversatio, and using this understanding to build computational systems that can improve the efficacy of conversation between people, and between people and computers. Looking Ahead Working with our customers over the past few years, we developed prototypes of tools that contribute to the data collection, analysis, and collaboration space. Alongside the development efforts for these prototypes, we are working on a version of Cerebro that will act as the trusted platform between them.  Additional Resources To read the paper Cerebro: A Platform for Collaborative Incident Response and Investigation, by Anne Connell, Tim Palko, and Hasan Yasar, please visithttp://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=06699007 For a list of other tools and resources developed by members of the CERT Cyber Security Solutions Directorate, please visithttp://www.cert.org/forensics/ To read the February 2013 report, CYBERSECURITY: National Strategy, Roles, and Responsibilities Need to Be Better Defined and More Effectively Implemented, please visit http://www.gao.gov/assets/660/652170.pdf
SEI   .   Blog   .   <span class='date ' tip=''><i class='icon-time'></i>&nbsp;Jul 27, 2015 02:15pm</span>
By Lori FlynnMember of the Technical StaffCERT Secure Coding team Although the CERT Secure Coding team has developed secure coding rules and guidelines for Java, prior to 2013 we had not developed a set of secure coding rules that were specific to Java’s application in the Android platform. Android is an important area to focus on, given its mobile device market dominance (82 percent of worldwide market share in the third quarter of 2013) as well as the adoption of Android by the Department of Defense. This blog post, the first in a series, discusses the initial development of our Android rules and guidelines. This initial development included mapping our existing Java secure coding rules and guidelines to Android applicability and also the creation of new Android- only rules for Java secure coding. Motivation for Our Work Software programmers produce more than 100 billion new lines of code for commercially available software put into operation each year, according to a recent article published in Defense Systems. Meanwhile, programming errors happen at an estimated rate of 15 to 50 errors per 1,000 lines of code. Even with the advent of automated testing tools, the article states that "numerous studies and a substantial amount of research suggest that approximately one error per every 10,000 lines of production code still exists after testing. That would equate to 10,000,000 errors in the code produced each year." To help cope with this enormous problem, the CERT Secure Coding team has developed a set of secure coding standards providing rules and guidelines that programmers and analysis tools can use to evaluate source code for compliance to the standards, which can help developers avoid or discover errors that make code insecure. A rule is advice for secure coding which is normative: a violation of the secure coding advice always must constitute a defect in the code. If the recommendation depends on programmer intention, then it might not be possible to automatically enforce the recommendation, and it is not considered normative.) A guideline is a secure coding recommendation that is excluded from being defined as a rule because it is not possible to form a normative requirement around it. For example, if a recommendation depends on programmer intention, then it might not be possible to automatically enforce the recommendation. Even if the recommendation is always a good idea, if violation of the recommendation does not always constitute a defect in the code, the recommendation is a guideline, not a rule. There have been several recent initiatives by the DoD to incorporate the Android OS into its mobile computing strategy: The Army Mobile Handheld Computing Environment is an effort to devise an Android-based smartphone framework and suite of applications for tactical operations. The Defense Advanced Research Projects Agency (DARPA) also selected the Android platform for its transformative apps program, which aims to develop a diverse array of militarily-relevant software applications (apps) using a new development and acquisition process.  Foundations of Our Work Our project aims to create rules that can be verified and enforced, as well as to develop a means for checking code for violations of these rules. This research was part of the Source Code Analysis Laboratory (SCALe). The Mobile SCALe project is extending this existing CERT concept to create a source code analysis laboratory environment for the analysis of code for mobile computing platforms. The first platform that Mobile SCALe has focused on is Android. Our work was led by a team of researchers who, in addition to myself, included William Klieber, Dr. Dean Sutherland, and David Svoboda of the CERT Secure Coding team; Dr. Lujo Bauer and Dr. Limin Jia of Carnegie Mellon University's Department of Electrical and Computer Engineering; and Dr. Fred Long of Aberystwyth University. The Japan Computer Emergency Response Team Coordination Center (JPCERT), including Masaki Kubo, Hiroshi Kumagai, and Yozo Toda, assisted in the development of the Android secure coding rules and guidelines. Our Mobile SCALe work can be separated into coding rules and guidelines development and static analysis development phases, which will guide the development of blog posts in this series. Our work in 2013 to develop Android secure coding rules and guidelines involved three main tasks: mapping our existing trove of Java secure coding rules, analyzing their applicability to the Android environment mapping our (at the time) in-development trove of Java secure coding guidelines, analyzing their applicability to the Android environment in time for that analysis to be included as an appendix in the published hardcopy book printed in 2013 developing new Android-only rules and guidelines for Java development  Rules and Guidelines for Java Secure Coding for Android As we outlined in our recently published technical report, Mobile SCALe: Rules and Analysis for Secure Java and Android Coding, we started developing secure coding rules and guidelines for Android by focusing on a specific language, Java, and adding an Android section to the CERT Oracle Secure Coding Standard for Java wiki. Although Android apps can be written in native code, such as C or C++, our 2013 work focused only on the Java language. Hosting secure coding rules and guidelines on our secure coding wiki allows us to collaboratively create these coding standards with assistance from the development and security communities, vet it with expert opinion, and receive feedback.  Android secure coding advice also exists elsewhere, but we found it in incomplete sets and in disparate locations across the web. One technique we use to develop new coding standards is to mine our other CERT secure coding standards. Many of our rules from C/C++/Java apply to C#. A second technique we use is to mine vulnerability databases, using public databases such as the Department of Homeland Security’s National Vulnerability Database, and sometimes CERT also is provided or develops privately held information about vulnerabilities. A third standard technique we use is to mine current literature. We found useful advice for secure Java coding for Android in separate locations, such as the Android developer website, Google, and security researcher websites, as well as research papers, online articles, and security conference presentations. We use a standardized format for all coding rules and guidelines (whether they are C, C++, Java, Perl, or Android-only) on our wiki. Each begins with a summary of the rule or guideline and an explanation of security issues it addresses. Then the wiki lists an example of non-conforming code and explains why it’s a problem. The wiki also includes references where readers can go for more information. We also provide a score that encases the severity of any vulnerability and the likelihood that the vulnerability can be exploited as well as an analysis of the cost of remediation if a violation is found in the code. Consider the following excerpts from our one of our guidelines applicable to secure development of Android apps: DRD00-J. Do not store sensitive information on external storage (SD card)Android provides several options to save persistent application data, one of which is External Storage, such as /sdcard or /mnt/sdcard.Files saved to the external storage are world-readable. Consequently, they can be modified by other apps installed on the device or by the user (by enabling USB mass storage and manipulating files from a PC).The Android API Guides [Android Guides 2013] Storage Options states:Caution: External storage can become unavailable if the user mounts the external storage on a computer or removes the media, and there’s no security enforced upon files you save to the external storage. All applications can read and write files placed on the external storage and the use can remove them.[Guideline]. Developers should not store sensitive data to external storage devices because files stored externally have no guarantee of confidentiality integrity, and availability.Noncompliant Code ExampleThe following code creates a file on the external storage and saves sensitive information to the file:  Compliant Solution (Save a File on Internal Storage) The following code uses the openfileoutput () method to create "myfile" in an application data directory with permission set to MODE_PRIVATE so that other apps cannot access the file:  Likelihood/Severity. We make three numerical estimates, based on our understanding of the security issues addressed by the secure coding advice (the rule or guideline). Severity estimates the consequences if the advice is ignored. Likelihood estimates how likely it is that a flaw introduced by violating the rule or guideline could lead to an exploitable vulnerability.  Remediation cost estimates how expensive it is to remediate existing code to comply with the advice. Our analysis found that the severity of a problem such as the one illustrated above would be high, and that the likelihood would be probable. The cost to remediate such a problem would be medium, since automatic detection with manual correction is possible if sensitive data sources are identified. The above format can be found for each of the CERT secure coding rules and guidelines. Three sections of the wiki were developed in 2013: Analysis for Android applicability of CERT Oracle secure coding rules and addition of Android-specific implementation advice to many of those rules on our wikiOur rules for Java are published in the book The CERT Oracle Secure Coding Standard for Java. The book was published in hard copy in September of 2011. Development of the standard continues on a wiki accessible and contributed to by the public but maintained by the CERT Secure Coding team. A summary of the analysis status at the end of 2013 is in Table 1 below. The full table with the current applicability analysis and analysis details for each  rule can be found here: Analysis of Applicability of CERT Oracle Java Secure Coding Rules to Android.For Java rules found to require Android-specific implementation advice, a new section, Android Implementation Details, was added to that rule’s wiki page just above the bibliography section. Analysis for Android applicability of CERT secure coding guidelinesA team of researchers developed guidelines for secure coding in Java, which were initially published on the wiki site maintained by the CERT Secure Coding team. Those guidelines were published in the book Java Coding Guidelines, which was published in September. Not all of the Java guidelines that the CERT Secure Coding team had developed and published, however, could be applied to the Android OS. A summary of the analysis status at the end of 2013 is in Table 1 below. The full table with the current applicability analysis, with analysis details for each guideline, is here: Analysis of Applicability of CERT Java Secure Coding GUIDELINES to Android. DRD-labeled Android secure coding rules and guidelinesWe also created new rules and guidelines for Android secure coding. As we state on the wiki, the new rules and guidelines labeled "DRD" are applicable only to the Android platform. They do not apply to the development of Java programs for other platforms. The list of DRD rules below is current as of February 17, 2014, and can be found on our wiki: - Do not store sensitive information on external storage (SD card) - Limit the accessibility to your sensitive content provider - Do not allow WebView to access sensitive local resource through file scheme - Do not broadcast sensitive information using an implicit intent - Do not log sensitive information - Always canonicalize a URL received by a content provider - Restrict access to sensitive activities - Do not release apps that are debuggable Two of the eight Android-only rules are actually Android-specific instances of more general Java rules: one concerning the logging of sensitive data and the other the canonicalization of file path names. Four of the remaining six rules focus on the handling of sensitive data by Android apps. In particular, these rules highlight aspects of Android programming that could lead unwary programmers to release sensitive data by misusing features of the Android architecture. Looking Ahead  The next post in this series will focus on the development of two tools that analyze information flow within and between Android apps. One of these tools analyzes potential communication between apps by focusing specifically on the sending and receiving of intents, which are a core inter-app communication mechanism in Android. Tracing these intents can be challenging because there are typically multiple methods of entry into a program. The blog post will also discuss a new static analysis tool we are developing. Later this year, we will also post about our ongoing Android secure coding work: expanding the coding rules and guidelines beyond Java and further development of our newest static analysis tool. Additional Resources To view the Android wiki on the CERT Secure Coding site, please visit https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=111509535 To read the SEI technical report, Mobile SCALe: Rules and Analysis for Secure Java and Android Coding, please visithttp://resources.sei.cmu.edu/library/asset-view.cfm?assetid=69225
SEI   .   Blog   .   <span class='date ' tip=''><i class='icon-time'></i>&nbsp;Jul 27, 2015 02:14pm</span>
By C. Aaron CoisSoftware Engineering Team Lead CERT Cyber Security Solutions Directorate This blog post is the first in a series on DevOps At Flickr, the video- and photo-sharing website, the live software platform is updated at least 10 times a day. Flickr accomplishes this through an automated testing cycle that includes comprehensive unit testing and integration testing at all levels of the software stack in a realistic staging environment. If the code passes, it is then tagged, released, built, and pushed into production. This type of lean organization, where software is delivered on a continuous basis, is exactly what the agile founders envisioned when crafting their manifesto: a nimble, stream-lined process for developing and deploying software into the hands of users while continuously integrating feedback and new requirements. A key to Flickr’s prolific deployment is DevOps, a software development concept that literally and figuratively blends development and operations staff and tools in response to the increasing need for interoperability. This blog post, the first in a series, introduces DevOps and explores its impact from an internal perspective on our own software development practices and through the lens of its impact on the software community at large. At the SEI, I oversee a software engineering team that works within CERT’s Cyber Security Solutions (CS2) Directorate. Within CS2, our engineers design and implement software solutions that solve challenging problems for federal agencies, law enforcement, defense intelligence organizations, and industry by leveraging cutting-edge academic research and emerging technologies. The manner in which teams develop software is constantly evolving. A decade ago, most software development environments were siloed, consisting of software developers in one silo and mainframe computers and a staff of IT professionals who maintained that mainframe in another silo. The arrival of virtualization marked a technological revolution in the field of software development. Before, if I needed a new server for my web application, I would have to order the server, and wait for it to ship. Then, upon arrival, I would have to rack the server, install and provision the system, and configure networking and access controls, all before I could begin my real development work.  Today, virtualization allows us to create and proliferate virtual machines almost instantly. For example, my developers simply click a button to create a virtual machine, and it appears instantly. This ability to instantaneously generate synthetic computers that run on a shared infrastructure underlies a range of modern technologies, such as Amazon’s Elastic Compute Cloud (Amazon EC2), that provide resizable compute capacity in the cloud. This new immediacy powers a lot of cool technologies, such as cloud platform OpenStack, Platform-as-a-Service (PaaS) solutions such as Heroku or Microsoft’s Windows Azure, and software development tools such as Vagrant, as well as enterprise infrastructures of most modern companies. At the same time, these technologies enable us to automate more tasks and command larger, more powerful infrastructures to increase the efficiency of our software development operations. It Works on My Machine There’s a saying often heard among young developers: "it works on my machine." This references developers, often early in their careers, who write a piece of code to fix a bug. Then, after testing the code locally on their machine, they proclaim it fit for deployment. Inevitably, when they install it on the customer’s system, the code breaks because of differences in system configuration. This problem provides a canonical example of the types of issues that DevOps can help you avoid. To mitigate this prolific problem, SEI researchers leverage Vagrant to manage the creation of a canonical environment (which is a set of virtual machines) for each software project replicated locally for each developer on the project team. These virtual machines are configured to be identical to the machines in our testing, staging, and ultimately production clouds. This setup ensures that if it works on our developer’s local machine, it will also work on the production system, whether hosted by us or in a customer’s infrastructure. Moreover, synchronicity assures developers that if it works on their machine, it will work on other developers’ machines because they are using the same environment for that project. Files that define the configuration of these project environments are small and can be checked in to source control along with software code. The ability to check configuration files into source control allows the development team to update, share, and version the project environment—along with the code itself—with the assurance of parity throughout the team. This methodology also provides a far simpler onboarding process when new developers join a project, as their environments setup is reduced to a single "create environment" command. This advanced process, unimaginable a decade ago, offers just one example of the power and precision that DevOps automation brings to software engineering. A New Approach for Developing Software Another innovation that has impacted the manner in which software is developed stresses collaboration between developers who wrote the software and the operations team (i.e., the IT group) that maintains an organization’s hardware infrastructure. The incarnation of DevOps can be traced to 2009 when a group of Belgian developers began hosting "DevOps" days during which they stressed collaboration and interaction between these two entities. Previously, developers and operations staff would work independently until their interests converged, usually with an inefficient and costly struggle to integrate their work products and efforts for the final race to deployment. DevOps emerged from the realization that infrastructure should support not only the production capability, but also the act of development. Ideally, DevOps should exist in one merged environment and set of concepts. For example, if I am writing software in a virtualized environment, I can be assured that the software I’ve developed will deploy seamlessly in that environment. Integrated DevOps assures us that the operations team remains involved throughout the software development lifecycle to ensure a smooth, efficient process through transition and deployment. Just as security concerns cannot be initially ignored and then successfully addressed at the end of a project, the same is true for successful deployment and maintenance concerns. DevOps provides an ideal solution for iterative software development environments, especially those that release software updates frequently, such as Flickr. The initial push for DevOps stemmed from the need to integrate operations to make software development more efficient and of higher quality. At the SEI, we are taking that concept and pushing forward, along with many others in the software industry, to fully-automated DevOps processes. Automated DevOps In an article published in the August 2011 edition of Cutter IT Journal, Why Enterprise Must Adopt Devops to Enable Continuous Delivery, co-authors Jez Humble and Joanne Molesky wrote that "automation of build, deployment, and testing" are key to achieving low lead times and rapid feedback. The authors write that automation also offers "configuration and steps required to recreate the correct environment for the current service are stored and maintained in a central location." Any software organization must be an early adopter of innovation to maintain a competitive edge. As a federally-funded research and development center, the SEI must maintain high standards of efficiency, security, and functionality in systems we develop. Forward-thinking approaches to process, including heavily automated DevOps techniques, allow us to systematically implement, maintain, and monitor these standards for each project we work on. Looking Ahead While this post served to introduce the concepts of virtualization and outline some DevOps practices, future posts in this series will present the following topics: a generalized model for DevOps advanced DevOps automation DevOps system integration continuous integration continuous deployment automated software deployment environment configuration. We welcome your feedback on this series, and what DevOps topics would be of interest to you. Please leave feedback in the comments section below. Additional Resources To listen to the podcast, DevOps—Transform Development and Operations for Fast, Secure Deployments, featuring Gene Kim and Julia Allen, please visithttp://url.sei.cmu.edu/js. To view the August 2011 edition of the Cutter IT Journal, which was dedicated to DevOps, please visit http://www.cutter.com/promotions/itj1108/itj1108.pdf Additional resources include the following sites: http://devops.com/   (currently being revamped) http://dev2ops.org/ http://devopscafe.org/ http://www.evolven.com/blog/devops-developments.html http://www.ibm.com/developerworks/library/d-develop-reliable-software-devops/index.html?ca=dat-
SEI   .   Blog   .   <span class='date ' tip=''><i class='icon-time'></i>&nbsp;Jul 27, 2015 02:14pm</span>
Displaying 29261 - 29270 of 43689 total records
No Resources were found.