Blogs
|
By Will DormannSenior Member of the Technical StaffCERT Vulnerability Analysis Team
Occasionally this blog will highlight different posts from the SEI blogosphere. Today we are highlighting a recent post by Will Dormann, a senior member of the technical staff in the SEI’s CERT Division, from the CERT/CC Blog. This post describes a few of the more interesting cases that Dormann has encountered in his work investigating attack vectors for potential vulnerabilities. An attack vector is the method that malicious code uses to propagate itself or infect a computer to deliver a payload or harmful outcome by exploiting system vulnerabilities.
SEI
.
Blog
.
<span class='date ' tip=''><i class='icon-time'></i> Jul 27, 2015 02:19pm</span>
|
|
By Julien Delange, Senior Member of the Technical StaffSoftware Solutions Division
When life- and safety-critical systems fail (and this happens in many domains), the results can be dire, including loss of property and life. These types of systems are increasingly prevalent, and can be found in the altitude and control systems of a satellite, the software-reliant systems of a car (such as its cruise control and anti-lock braking system), or medical devices that emit radiation. When developing such systems, software and systems architects must balance the need for stability and safety with stakeholder demands and time-to-market constraints. The Architectural Analysis & Design Language (AADL) helps software and system architects address the challenges of designing life- and safety-critical systems by providing a modeling notation with well-defined real-time and architectural semantics that employ textual and graphic representations. This blog posting, part of an ongoing series on AADL, focuses on the initial foundations of AADL.
The SEI’s Work on AADL
The SEI has been one of the lead developers of AADL and has participated on the AADL standardization committee since its inception. SEI researchers also developed the Open Source AADL Tool Environment (OSATE), which is the reference implementation for supporting AADL within the Eclipse environment and a number of AADL-based analysis capabilities. The SEI and other researchers and developers have used AADL as a foundation to design and analyze life- and safety-critical systems. Likewise, several projects have used AADL to design software architecture and analyze, validate, or improve various quality attributes, such as reliability, latency/performance, or security.
While AADL was initially conceived for use in avionics, this blog series has demonstrated its applicability in other domains that rely on life- and/or safety-critical systems, including the medical domain. Likewise, AADL tools have been developed by organizations such as Rockwell Collins, the European Space Agency, Honeywell, and Ellidiss Software to provide modeling concepts that describe the runtime architecture of application systems in terms of concurrent tasks, their interactions, and their mapping onto an execution platform.
Software with Reuse: AADL Beginnings
Bruce Lewis, an experienced software practitioner working for the U.S. Army Aviation & Missile Research at the Development & Engineering Center (AMRDEC) in Huntsville Alabama, is a key player in the development and evolution of AADL. Lewis, who formed the AADL Standards Committee and co-drafted the first-ever requirements document for the standard, attended a standards committee meeting at the SEI’s Pittsburgh headquarters in July 2013 and described his purpose in the standardization of AADL:
Originally, we were working on different reuse techniques and ways to build software with reuse. This was in the early '90s. We found that the different reuse approaches weren't really effective without an architecture description where we could actually analyze how components would fit together and also analyze the effect of putting them together. DARPA [the Defense Advanced Research Projects Agency] also was not satisfied with the current state-of-the-art for domain specific sets of components. They also wanted an architecture approach. So, I went to DARPA to talk to them about a possible solution. They pointed me to the Domain-Specific Software Architecture (DSSA) project, which was inventing the concept of an architecture description language. We began to experiment with architecture description languages for real-time, mission-, and safety-critical systems. That's how I met Peter [Feiler], who also was working on the DSSA project.
DARPA offered me the opportunity to lead the research projects for the Honeywell MetaH language over several DARPA programs. MetaH became the foundation for AADL. Our primary project involved essentially reengineering then enhancing a missile system using a reference architecture that we had developed. We then developed and hosted the missile system to multiple targets on single and distributed multi-processor platforms. We verified the system flew correctly by running the environment simulation and the tactical code on the embedded processors for each processor target. In each case, the missile flew correctly after rapid ports to the new processing environment. MetaH was so effective in our army experiments that I felt like it was very important to move it to a standard and push it forward in the industry as a more powerful, more effective means for real-time system building and evolution.
Lewis and Feiler shepherded AADL into a language that could be used to ensure the reliability of other safety-critical systems in other industries including aerospace, telecommunications, and medical equipment. For these industries, the exponential growth and increasing complexity of software and systems development had become a major area of concern. A team of SEI researchers led by Feiler detailed the exponential growth and complexity involved in developing safety-critical systems in the technical report, System Architecture Virtual Integration: An Industrial Case Study.
In that technical report, the authors described how aerospace software-reliant systems have experienced exponential growth in size and complexity, and also unfortunately in errors, rework, and cost, primarily due to integration issues that are detectable through architecture-centric analysis. New development of safe aircraft is reaching the limit of affordability. They also noted that the size of software systems, measured in source lines of code (SLOC), had doubled every four years since the mid-1990s and reached around 27 million SLOC of software by 2010. Rework costs were in the multi-billions of dollars. Discovering even a small percentage (10 percent) of these issues would yield a major payoff.
A key challenge faced by developers and testers of aerospace software systems resides in the integration of industrial and commercial off-the-shelf (COTS) components from multiple suppliers. Developers increase produce systems by integrating new and existing components rather than starting from scratch. This approach allows some benefits, such as reuse of existing development efforts, ease in the creation of product lines, and the reduction of component production costs. There are, however, some drawbacks to this approach:
Components are loosely related and never intended to work together in the new context. Assumptions constraining usability may not be documented.
Newly assembled artifacts have different and potentially conflicting requirements.
From an OEM’s perspective, multiple alternative components need to be evaluated for architectural effects.
Existing components were validated and tested in one system but could not work in other environments (e.g. deploying a function on different processors, different timelines, messaging infrastructures, scheduling, error handling environments, or execution runtime)
All other integration and/or deployment issues engineers had already experienced during operational projects, such as memory, bus, processor utilization, data integrity, safety and security requirements, shared resource contention, end-to-end latency, jitter sensitivity, fault-tolerance requirements, execution behavior, definitions of data, functional integration correctness, and underlying assumptions internal/external to the component.
These errors are likely discovered during the integration test phase, late in the development process. As a result, they increase the need for testing, introduce the need for rework and new development, increase costs, and postpone product delivery.
To address the problem of integration, AADL needed to represent the architecture early and incrementally to find problems typically reported during the system integration phase. When this approach is applied during the requirements and design phase, it will result in errors that can be repaired at a time when they are less costly to fix. Lewis described the evolution of AADL to address system integration:
The DARPA/Honeywell MetaH language provided very powerful concepts for architecture analysis and rapid generative integration driven by the analytical models, but the language was limited in flexibility and scope. We had 30 extensions from DARPA experiments that we knew we wanted to make plus industry and academic input on the committee. Based on his wealth of experience, Feiler became the primary AADL language architect. We knew that the AADL needed to support common analysis and integration of specifications across contractors based on precise core standard semantics for real-time systems. AADL also needed to provide flexibility for language extension for properties and annexes for specialized domains and tools. Since architectures must support evaluation of many quality attributes and requirements, AADL was developed for multi-domain analysis from a centralized model so we could detect side effects across these domains through a centralized, shared architecture model. It was also developed to support ease of incremental component refinement through extension mechanisms and to support analysis of incomplete specifications. Throughout the development process, we verify the integration of components into the virtual real-time system.
During our DARPA years, we moved into the more complex and challenging aviation domain. We wanted a domain that was really challenging where we could exploit the power of architecture expression to demonstrate many emerging architectural qualities in complex systems. This proved to be well targeted and timely. The aviation industry, at the same time, was beginning to recognize that they were suffering from what were essentially failures to effectively integrate the software system architecture. Costs were extremely high, and there were many expensive programs that demonstrated the fact that we need some kind of an architectural approach that was quantitative and qualitative. This approach would help us understand the architecture as early as possible, do virtual integration, and then ultimately build the system.
Our next post in this series will describe the use of AADL to improve safety and reliability in the aerospace industry as part of the System Architecture Virtual Integration (SAVI) project.
Additional Resources
To view the AADL Wiki, please visit https://wiki.sei.cmu.edu/aadl/index.php/Main_Page
For more information about AADL, please visit http://www.aadl.info
To read the SEI technical report, System Architecture Virtual Integration: An Industrial Case Study, please visithttp://resources.sei.cmu.edu/library/asset-view.cfm?assetID=9145
SEI
.
Blog
.
<span class='date ' tip=''><i class='icon-time'></i> Jul 27, 2015 02:19pm</span>
|
|
By Ian GortonSenior Member of the Technical Staff Software Solutions Division(This blog post was co-authored by John Klein)
New data sources, ranging from diverse business transactions to social media, high-resolution sensors, and the Internet of Things, are creating a digital tidal wave of big data that must be captured, processed, integrated, analyzed, and archived. Big data systems storing and analyzing petabytes of data are becoming increasingly common in many application areas. These systems represent major, long-term investments requiring considerable financial commitments and massive scale software and system deployments. With analysts estimating data storage growth at 30 to 60 percent per year, organizations must develop a long-term strategy to address the challenge of managing projects that analyze exponentially growing data sets with predictable, linear costs. This blog post describes a lightweight risk reduction approach called Lightweight Evaluation and Architecture Prototyping (for Big Data) we developed with fellow researchers at the SEI. The approach is based on principles drawn from proven architecture and technology analysis and evaluation techniques to help the Department of Defense (DoD) and other enterprises develop and evolve systems to manage big data.
The Challenges of Big Data
For the DoD, the challenges of big data are daunting. Military operations, intelligence analysis, logistics, and health care all represent big data applications with data growing at exponential rates and the need for scalable software solutions to sustain future operations. In 2012, the DoD announced a $250 million annual research and development investment in big data that is targeted at specific mission needs such as autonomous systems and decision support. For example, in Data-to-Decisions S&T Priority Initiative, the DoD has developed a roadmap through 2018 that identifies the need for distributed, multi-petabyte data stores that can underpin mission needs for scalable knowledge discovery, analytics, and distributed computations.
The following examples illustrate two different DoD data-intensive missions:
Electronic health records. The Military Health System provides care for more than 9.7 million active-duty personnel, their dependents, and retirees. The 15-year-old repository operates a continouously growing petascale database, with more than 100 application interfaces. System workload types include care delivery, force readiness, and research/analytics. The system does not currently meet the target of 24/7 availability.
Flight data management. Modern military avionics systems capture tens of gigabytes (GBs) of data per hour of operation. This data is collected in in-flight data analysis systems, which perform data filtering and organization, correlation with other data sources, and identification of significant events. These capabilities support user-driven analytics for root cause detection and defect prediction to reduce engineering and maintenance costs.
To address these big data challenges, a new generation of scalable data management technologies has emerged in the last five years. Relational database management systems, which provide strong data-consistency guarantees based on vertical scaling of compute and storage hardware, are being replaced by NoSQL (variously interpreted as "No SQL", or "Not Only SQL") data stores running on horizontally-scaled commodity hardware. These NoSQL databases achieve high scalability and performance using simpler data models, clusters of low-cost hardware, and mechanisms for relaxed data consistency that enhance performance and availability.
A challenging technology adoption problem is being created, however, by a complex and rapidly evolving landscape of non-standardized technologies built on radically different data models. Selecting a database technology that can best support mission needs for timely development, cost-effective delivery, and future growth is non-trivial. Using these new technologies to design and construct a massively scalable big data system creates an immense software architecture challenge for software architects and DoD program managers.
Why Scale Matters in Big Data Management
Scale has many implications for software architecture, and we describe two of them in this blog post. The first revolves around the fundamental changes that scale enforces on how we design software systems. The second is based upon economics, where small optimizations in resource usage at very large scales can lead to huge cost reductions in absolute terms. The following briefly explores these two issues:
Designing for scale. Big data systems are inherently distributed systems. Hence, software architects must explicitly deal with issues of partial failures, unpredictable communications latencies, concurrency, consistency, and replication in the system design. These issues are exacerbated as systems grow to utilize thousands of processing nodes and disks, geographically distributed across data centers. For example, the probability of failure of a hardware component increases with scale.
Studies such as this 2010 research paper, "Characterizing Cloud Computing Hardware Reliability," find that 8 percent of all servers in a data center experience a hardware problem annually, with the most common cause being disk failure. In addition, applications must deal with unpredictable communication latencies and network partitions due to link failures. These requirements mandate that scalable applications treat failures as common events that must be handled gracefully to ensure that the application operation is not interrupted. To address such requirements, resilient big data software architectures must:
Replicate data across clusters and data centers to ensure availability in the case of disk failure or network partitions. Replicas must be kept consistent using either master-slave or multi-master protocols. The latter requires mechanisms to handle inconsistencies due to concurrent writes, typically based on Lamport clocks.
Design components to be stateless and replicated and to tolerate failures by dependent services, for example, by using the Circuit Breaker pattern described by Michael T. Nygard in his book Release IT and returning cached or default results whenever failures are detected. This pattern ensures that failures do not rapidly propagate across components and allow applications an opportunity for recovery.
Economics at scale. Big data applications employ many thousands of compute-and-storage resources. Regardless of whether these resources are capital purchases or resources hosted by a commercial cloud provider, they remain a major cost and hence a target for reduction. Straightforward resource reduction approaches (such as data compression) are common ways to reduce storage costs. Elasticity is another way that big data applications optimize resource usage, by dynamically deploying new servers to handle increases in load and releasing them as load decreases. Elastic solutions require servers that boot quickly and application-specific strategies for avoiding premature resource release.
Other strategies seek to optimize the performance of common tools and components to maintain productivity while decreasing resource utilization. For example, Facebook built HipHop, a PHP-to-C++ transformation engine that reduced its CPU load for serving web pages by 50 percent. At the scale of Facebook’s deployment, this represents a very significant resource reduction and cost savings. Other targets for reduction are software license costs that can become cost prohibitive at scale. Cost reduction has seen a proliferation of database and middleware technologies developed recently by leading internet organizations, and many of these have been released as freely available open source. Netflix and Linkedin provide examples of powerful, scalable technologies for big data systems.
Other implications of scale for big data software architectures revolve around testing and fault diagnosis. Due to the deployment footprint of applications and the massive data sets they manage, it’s impossible to create comprehensive test environments to validate software changes before deployment. Approaches such as canary testing and simian armies represent the state of the art in testing at scale. When the inevitable problems occur in production, rapid diagnosis can only be achieved by advanced monitoring and logging. In a large-scale system, log analysis itself can quickly become a big data problem as log sizes can easily reach hundreds of GBs per day. Logging solutions must include a low overhead, scalable infrastructure such as Blitz4J, and the ability to rapidly reconfigure a system to redirect requests away from faulty services.
Necessarily large investments and magnified risks that accompany the construction of massive, scalable data management and analysis systems exacerbate these challenges of scale. For this reason, software engineering approaches that explicitly address the fundamental issues of scale and new technologies are a prerequisite for project success.
Designing for Scalability with Big Data
To mitigate the risks associated with scale and technology, a systematic, iterative approach is needed to ensure that initial design models and database selections can support the long-term scalability and analysis needs of a big data application. A modest investment in upfront design can produce unprecedented returns on investment in terms of greatly reduced redesign, implementation, and operational costs over the long lifetime of a large-scale system.
Because the scale of the target system prevents the creation of full-fidelity prototypes, a well-structured software engineering approach is needed to frame the technical issues, identify the architecture decision criteria, and rapidly construct and execute relevant but focused prototypes. Without this structured approach, it is easy to fall into the trap of chasing after a deep understanding of the underlying technology instead of answering the key go/no-go questions about a particular candidate technology. Getting the right decisions for the minimum cost should be the aim of this exercise.
At the SEI, we have developed a lightweight risk reduction approach that we have initially named Lightweight Evaluation and Architecture Prototyping (for Big Data), or LEAP(4BD). Our approach is based on principles drawn from proven architecture and technology analysis and evaluation techniques such as the Quality Attribute Workshop and the Architecture Tradeoff Analysis Method. LEAP(4BD) leverages our extensive experience with architecture-based design and assessment and customizes these methods with deep knowledge and experience of the architectural and database technology issues most pertinent to big data systems. Working with an organization’s key business and technical stakeholders, this approach involves the following general guidelines:
Assess the existing and future data landscape. This step identifies the application’s fundamental data holdings, their relationships, the most frequent queries and access patterns, and their required performance and quantifies expected data and transaction growth. The outcome sets the scope and context for the rest of the analysis and evaluation and provides initial insights into the suitability of a range of contemporary data models (e.g., key-value, graph, document-oriented, column-oriented, etc.) that can support the application’s requirements.
Identify the architecturally-significant requirements and develop decision criteria. Focusing on scalability, performance, security, availability, and data consistency, stakeholders characterize the application’s quality attribute requirements that drive the system architecture and big data technology selection. Combining these architecture requirements with the characteristics of the data model (previous step) provides the necessary information for initial architecture design and technology selection.
Evaluate candidate technologies against quality attribute decision criteria. Working with the system architects, this step identifies and evaluates candidate big data technologies against the applications’ data and quality attribute requirements, and selects a small number of candidates (typically two to four) for validation through prototyping and testing. The evaluation is streamlined by using an evaluation criteria framework for big data technologies that we are developing as part of our internal R&D. This framework focuses the assessment activities for specific database products against a generic collection of behavioral attributes and measures.
Validate architecture decisions and technology selections. Through focused prototyping, our approach ensures that the system design and selected database technologies can meet the defined quality attribute needs. By evaluating the prototype’s behavior against a set of carefully designed, application-specific criteria (e.g., performance, scalability, etc.), this step provides concrete evidence that can support the downstream investment decisions required to build, operate, and evolve the system. During the construction and execution of the prototypes, the project team develops experience working with the specific big data technologies under consideration.
Benefits
LEAP(4BD) provides a rigorous methodology for organizations to design enduring big data management systems that can scale and evolve to meet long-term requirements. The key benefits are:
A focus on the database and key analysis services addresses the major risk areas for an application. This keeps the method lean and produces rapid design insights that become the basis for downstream decisions.
A highly transparent and systematic analysis and evaluation method significantly reduces the burden of justification for the necessary investments to build, deploy, and operate the application.
Maximizing the potential for leveraging modern big data technologies reduces costs and ensures that an application can satisfy its quality attribute requirements.
Greatly increased confidence in architecture design and database technology selection, and hands-on experience working with the technology during prototype development, reduces development risks.
The identification of outstanding project risks that must be mitigated in design and implementation, along with detailed mitigation strategies and measures that allow for continual assessment.
Looking Ahead
We are currently piloting LEAP(4BD) with a federal agency. This project involves both scalable architecture design and focused NoSQL database technology benchmarking, as well as an assessment of features to meet the key quality attributes for scalable big data systems.
We are interested in working with organizational leaders who want to ensure appropriate technology selection and software architecture design for their big data systems. If you are interested in collaborating with us on this research, please leave us feedback in the comments section below or send an email to us at igorton@sei.cmu.edu or jklein@sei.cmu.edu.
Additional References
To read the article "Time, clocks, and the ordering of events in a distributed system" by Leslie Lamport, please visit http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf
To read the paper "Characterizing cloud computing hardware reliability," by Kashi Venkatesh Vishwanath and Nachiappan Nagappan, which was presented at the Proceedings of the First Association for Computing Machinery Symposium on Cloud Computing, please visithttp://dl.acm.org/citation.cfm?doid=1807128.1807161
To read more about the challenges of ultra-large-scale systems please visit http://www.sei.cmu.edu/uls
SEI
.
Blog
.
<span class='date ' tip=''><i class='icon-time'></i> Jul 27, 2015 02:19pm</span>
|
|
By Timur SnokeMember of the Technical StaffCERT Network Situational Awareness Team
Occasionally this blog will highlight different posts from the SEI blogosphere. Today we are highlighting a post from the CERT/CC Blog by Timur Snoke, a member of the technical staff in the SEI’s CERT Division. This post describes maps that Timur has developed using Border Gateway Protocol (BGP) routing tables to show the evolution of public-facing autonomous system numbers (ASN). These maps help analysts inspect the BPG routing tables to reveal disruptions to an organization’s infrastructure. They also help analysts glean geopolitical information for an organization, country, or a city-state, which helps them identify how and when network traffic is subverted to travel nefarious alternative paths to place communications deliberately at risk.
SEI
.
Blog
.
<span class='date ' tip=''><i class='icon-time'></i> Jul 27, 2015 02:19pm</span>
|
|
By Julien Delange, Senior Member of the Technical StaffSoftware Solutions Division
The size and complexity of aerospace software systems has increased significantly in recent years. When looking at source lines of code (SLOC), the size of systems has doubled every four years since the mid 1990s, according to a recent SEI technical report. The 27 million SLOC that will be produced from 2010 to 2020 is expected to exceed $10 billion. These increases in size and cost have also been accompanied by significant increases in errors and rework after a system has been deployed. Mismatched assumptions between hardware, software, and their interactions often result in system problems that are detected only after the system has been deployed when rework is much more expensive to complete. To address this problem, the Society of Automotive Engineers (SAE) released the Architecture Analysis & Design Language (AADL), which helps software and system architects address the challenges of designing life- and safety-critical systems by providing a modeling notation with well defined real-time and architectural semantics that employ textual and graphic representations. This blog posting, part of an ongoing series on AADL, describes the use of AADL in the aerospace industry to improve safety and reliability.
The Evolution of AADL
The SEI has been a leader in the development of AADL and has participated on the AADL standardization committee since its inception. Likewise, several projects have used AADL to design software architecture and analyze, validate, or improve various quality attributes, such as reliability, latency/performance, or security.
While AADL was initially conceived for use in avionics, this blog series has demonstrated its applicability in other domains that rely on life- and/or safety-critical systems, including the medical domain. This post focuses on the evolution of AADL from its initial foundations (covered in our previous blog posting) to its use in the System Architecture Virtual Integration (SAVI) project.
SAVI is part of the collaborative and applied research performed by an aerospace industry research cooperative known as the Aerospace Vehicle Systems Institute (AVSI). SAVI is a multi-year project separated into different tasks that aim to improve system and software development and significantly reduce development costs through the early discovery of what will eventually become integration issues. Each sub-task addresses a particular issue companies are actually struggling with. The remainder of this post shows how the design rationale of AADL fits with SAVI’s goals and describes the uses of AADL within the project to improve software safety and reliability.
A Focus on Safety
A major integration challenge in fielding aerospace software systems stems from safety requirements. Detecting and analyzing faults/errors within a component can prove demanding for software architects. These tasks are especially hard during component integration, when it is necessary to understand how the faults would propagate among the architecture without testing or simulation. This challenge served as a catalyst for the development of demanding certification standards such as DO178C, also known as "Software Considerations in Airborne Systems and Equipment Certification."
SEI researchers, led by Peter Feiler, a senior member of the technical staff and author of AADL, made the decision to document the benefits of the architecture model and bind safety-related specifications (such as faults occurrence and propagation across the architecture) to the architecture model. As Bruce Lewis—an experienced software practitioner working for the U.S. Army Aviation & Missile Research at the Development & Engineering Center (AMRDEC) in Huntsville Alabama and a key player in the development and evolution of AADL—reported:
The error models are placed on components, but when we integrate the components, we know what kinds of errors they can accept on their input and what kinds of errors they can propagate out. Then we can do an integration of error models to actually look at how errors might propagate to the system.
SEI researchers extended AADL to describe system faults and analyze the safety of systems. This work consisted of two parts:
enhancing the core of the technology with an updated sub-language (the Error-Model Annex Version 2, an AADL annex being standardized by the SAE) to better describe architecture safety concerns
developing new tools to process this additional information within the model, validate system safety, and help engineers produce documents for validating the system
These enhanced capabilities enable AADL users to augment their software architectures with fault description and specify error source and propagation across hardware and software components. These enhancements can also be used to improve system validation and detect faults that are potentially overlooked during the initial development process and that may lead to errors. These safety analyses can now be evaluated incrementally as the architecture is refined during the development process, again permitting early discovery of issues and reducing manual effort.
A Safety-Dedicated Sub-language for AADL
To improve system validation and detect faults, an updated annex language has been proposed by the SEI and is currently under review by the AADL Standardization Committee. The effort involved contributors from several companies and domains, and was intended to make the document more user friendly for safety analysts. SAE will publish this addition to the core AADL language in the coming months as an official standard annex. In addition, the SEI will publish technical reports to present this sub-language and demonstrate its use on real systems, such as the Wheel Brake System example. This example (originating from the SAE AIR6110 standard that demonstrates the safety validation process for the avionics domain) demonstrates the relevance of our approach and that the approach scales with current industrial practices
Researchers from the SEI also recently enhanced the AADL reference environment, OSATE, to analyze system safety and produce documents required by safety evaluation standards, such as SAE ARP4761. For now, the toolset provides several functions that allow software engineers to extract safety-related information from an architecture model and show the impact of a fault in documents, such as
Functional Hazard Assessment provides a list of faults with their description, reference to design document and severity classification
Fault Tree Analysis provides a hierarchical description of fault dependencies.
Failure Mode and Effects Analysis provides a list of all faults with their propagation paths across the architecture.
Reliability Block Diagram provides an analysis that evaluates the failure probability of a component according to its failure events and incoming error propagations.
These functions have been demonstrated in the context of the SAVI project. Also, as these functions can be interfaced with open-source tools that are available at no cost, they can be tested using the public release of OSATE. In that context, the SEI researchers detailed how to reproduce the demonstration using a public model of the Wheel Brake System from the SAE AIR6110 standard.
Beyond Safety: Analyzing System Behavior
The next iteration of SAVI will focus on system behavior. Many projects from the avionics or aerospace communities reuse software and hardware components from different projects, trying to reduce new development and take advantage of existing components. When software and hardware components are integrated, however, tests show that behavior discrepancies can generate issues that were not predicted in their initial execution environment. For example, different development teams may use different assumptions regarding quality attributes (such as timing, resources dimensions, etc.) resulting in issues (late arrival of data, insufficient computing capacity, etc.) during integration. This discrepancy may lead to significant increases in testing efforts and development costs as well as a postponement in the delivery of the system.
To overcome these issues, SEI researchers are working to extend AADL by integrating component behavior descriptions into a new "behavior annex." This new addition to the AADL language would provide software architects with the ability to describe the software behavior in terms of modes (operational, failed, degraded, etc.), states (running, idle, waiting for inputs, etc.) and operations (call to system functions or subprograms, send/receive values from interfaces, etc.). This additional architecture information would also allow architects to check the consistency of integrated components behaviors and detect potential defects or requirements mismatches. Part of this work may lead to the development of analysis tools that will integrate an existing behavior specification (such as state machine) into an architecture model and detect potential defects.
Our next post will present the actual outcome of this work and the envisioned tool support.
Additional Resources
To view the AADL Wiki, please visit https://wiki.sei.cmu.edu/aadl/index.php/Main_Page
For more information about AADL, please visit http://www.aadl.info
To read the SEI technical report, System Architecture Virtual Integration: An Industrial Case Study, please visithttp://resources.sei.cmu.edu/library/asset-view.cfm?assetID=9145
SEI
.
Blog
.
<span class='date ' tip=''><i class='icon-time'></i> Jul 27, 2015 02:18pm</span>
|
|
By Jose Morales Senior Member of the Technical Staff CERT Division
In early 2012, a backdoor Trojan malware named Flame was discovered in the wild. When fully deployed, Flame proved very hard for malware researchers to analyze. In December of that year, Wired magazine reported that before Flame had been unleashed, samples of the malware had been lurking, undiscovered, in repositories for at least two years. As Wired also reported, this was not an isolated event. Every day, major anti-virus companies and research organizations are inundated with new malware samples. Although estimates vary, according to an article published in the October 2013 issue of IEEE Spectrum, approximately 150,000 new malware strains are released each day. Not enough manpower exists to manually address the sheer volume of new malware samples that arrive daily in analysts’ queues. Malware analysts instead need an approach that allows them to sort out samples in a fundamental way so they can assign priority to the most malicious of binary files. This blog post describes research I am conducting with fellow researchers at the Carnegie Mellon University (CMU) Software Engineering Institute (SEI) and CMU’s Robotics Institute. This research is aimed at developing an approach to prioritizing malware samples in an analyst’s queue (allowing them to home in on the most destructive malware first) based on the file’s execution behavior.
Existing Approaches to Prioritizing Malware Analysis
Before beginning work on developing an approach for prioritizing malware analysis, our team of researchers examined existing approaches and found very few. Most institutions in academia, government, and industry analyze malware by randomly selecting samples, ordering them alphabetically, or analyzing a binary file in response to a request for a specific file based on its MD5 , SHA-1, or SHA-2 cryptographic hash value.
Our team decided to take a systematic approach to malware analysis that takes incoming samples and analyze them using runtime analysis. At a high-level, this approach collects and categorizes salient features. Using a clustering algorithm, our approach then ideally prioritizes the malware sample appropriately in an analyst’s queue based on their description of the type of malware they want to analyze upon its arrival to their repository.
A key idea in our approach involved the use of dynamic analysis to measure the maliciousness of a malware sample based on its execution behavior. The assumption is that all malware samples have certain malicious events they must perform to carry out nefarious deeds. The implementation of these deeds is captured at runtime and may prove to be useful assessment characteristics.
Salient and Inferred Features
When we initially began this research, we extracted more than two dozen features that were a mix of
actual runtime data that could be easily identified as suspicious. An example is modification of a registry key such as Windows\CurrentVersion\Run.
inferred features that are derived by analyzing the details of one or more actual runtime suspicious data. An example is a malware setting itself up to start at system reboot by analyzing the details of the Windows\CurrentVersion\Run registry key modification and concluding the edit was the malware placing its own file system path in this key assuring it will execute on reboot.
It’s important to note that malware doesn’t typically commit all of its nefarious deeds with just one running process. In examining features, we also created malware infection trees to gain a better understanding of the processes and files created by the malware. The paper, "Building Malware Infection Trees," which I co-authored, allowed us to view the malware sample as "a directed tree structure" (see Figure 1 below) with each node representing a file or process that the malware had created and each edge representing file creation, process creation, self-replication or dynamic code injection behavior. Figure 1. A Malware Infection Tree for Poison Malware.Our research focused on the following three areas of suspicious behavior, all of these behaviors were recorded for the executing malware process and any related processes in its malware infection tree. We assume these behaviors being performed by a member of the malware infection tree can lead to suspicious behaviors usable in assessing the malware sample:
File systems. We looked at the typical open, write, create, move, and delete file operations. These are typical behaviors for any executable, but when performed by malware they can occur in greater numbers or target specific files as part of a broader malicious behavior. We also examined instances of self-replication where malware copies itself into a new file or copies itself into an existing file as well as a piece of malware deleting itself from the system. We also considered cases where a child process in the malware infection tree deleted the static file image of an ancestor process which was deemed a possible attempt to avoid detection.
Networks. We examined domain name service (DNS) queries and reverse DNS (rDNS) queries. Specifically, we searched for high occurrences of DNS and rDNS requests, especially failed requests which indicate that malware may be testing a set of potentially active IP addresses to connect with a command-and-control server or some other malicious server.We also considered multiple failed Transmission Control Protocol (TCP) connection attempts, specifically cases where a URL was DNS queried multiples times and then, when an attempt was made to connect to the IP address, the attempt failed. This scenario indicates that the malware might be harvesting IP addresses through DNS requests, but the IP addresses it harvests are servers that have not been activated or have been taken down, blocked, removed, or blacklisted. A piece of malware sometimes keeps repeating this action until it connects with an IP address or times out. We also identified use of non-typical network protocols.
Processes. We primarily looked at whether the malware created a completely new process, deleted an already running process, or started a new process thread. When a malware sample deleted a process, we checked to see if the process that was being deleted belonged to a known piece of anti-malware software. If that was the case, we inferred that the malware was actually trying to disable the system’s security measures to ensure its safety and longevity.Our analysis of processes also included dynamic code injection where the malware process running on a system identifies already existing processes that are likely to be running on any version of the target system. Specifically, we looked at several Windows platform standard processes that start at runtime, such as svchost.exe, winlogon.exe, and taskhost.exe.In dynamic code injection, the malware process chooses an already running process and writes malicious commands into the process’s memory. Next, the malware creates a new thread that will execute the malicious instructions that have just been written to its memory. These steps result in the modified process behaving in non-typical ways. With dynamic code injection, the malware actually delegates malicious acts to other process so that that the system will detect the other process as doing something suspicious and not the original malware that is running on the system.We also examined whether, during our runtime analysis, the malware tried to modify the registry key Windows\CurrentVersion\Run. A modification of this registry key is obviously something that we want to know about because it can be an indicator of malicious behavior.
A Technical Dive into Our Approach
Once we determined the features we wanted to look at, we turned our attention to creating a training set. To do so, our team compiled a set of malware samples classified as advanced persistent threats (APTs), botnets, and Trojans. We also included known malware that was ranked in the Top 5 most dangerous and most persistent in the wild in the past five years, according to Kaspersky’s securelist.org website. We submitted, executed, and analyzed each sample in our runtime analysis framework and extracted our chosen features.
Once we created our training set, we assembled 11,000 malware samples that we clustered based on the training set. The training set allowed us to gain an initial understanding about what our algorithms tell us about how we could prioritize the malware. We examined execution behavior at the user and kernel levels for a three-minute, run-time analysis.
We then determined, for the whole set, if behaviors were identified mostly from user- or kernel-level collected data. Next, we created a set of characteristics from the most pervasive observed execution behaviors and repeated our experiment using a larger mixed malware sample set and clustered results based on our set of characteristics. At the end of analysis we collected the feature sets for our training sets and our test set of 11,000 samples, and we submitted them for analysis with various machine learning algorithms to determine which one is best for prioritizing malware samples based on our features.
Collaborations
Dr. Jeff Schneider, a researcher at the Robotics Institute in CMU’s School of Computer Science and an expert on classification and clustering, agreed to analyze our feature sets and create a classifier to prioritize malware samples. We are also working with software engineers within the CERT Division’s Digital Intelligence and Investigations Directorate Software Engineering group who wrote code for us including the feature extraction code.
Next Steps
The goal of our research was to allow analysts greater efficiency in establishing a priority queue for malware analysis. These priorities can vary based on whether the malware analyst works in the financial industry and is interested in Distributed Denial of Service (DDoS) attacks, botnets, or Trojans, or whether the analyst works in the DoD’s cyber command and is interested in APTs and protecting national assets.
If our approach works as planned, we will accelerate efforts to formalize an automated prioritization system using our established set of salient and inferred features that could be integrated into current analysis frameworks using as input a live continuous malware feed.
We welcome your feedback on our approach in the comments section below.
Additional Resources
To read the paper, "Building Malware Infection Trees," by Jose Andre Morales, Michael Main, Weiliang Luo. Shouhuai Xu, and Ravi Sandhu, please visit http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6112326
To read about other malware research initiatives at the SEI, please visithttp://blog.sei.cmu.edu/archives.cfm/category/malware
SEI
.
Blog
.
<span class='date ' tip=''><i class='icon-time'></i> Jul 27, 2015 02:18pm</span>
|
|
By Douglas C. SchmidtPrincipal Researcher
To deliver enhanced integrated warfighting capability at lower cost across the enterprise and over the lifecycle, the Department of Defense (DoD) must move away from stove-piped solutions and towards a limited number of technical reference frameworks based on reusable hardware and software components and services. There have been previous efforts in this direction, but in an era of sequestration and austerity, the DoD has reinvigorated its efforts to identify effective methods of creating more affordable acquisition choices and reducing the cycle time for initial acquisition and new technology insertion. This blog posting is part of an ongoing series on how acquisition professionals and system integrators can apply Open Systems Architecture (OSA) practices to decompose large monolithic business and technical designs into manageable, capability-oriented frameworks that can integrate innovation more rapidly and lower total ownership costs. The focus of this posting is on the evolution of DoD combat systems from ad hoc stovepipes to more modular and layered architectures.
Motivating the Need for Technical Reference Frameworks
DoD programs face a number of challenges in this era of increasing threats and constrained budgets. As nation state actors become more sophisticated, the nature of threats becomes asymmetric. It is therefore critically important that the DoD be able to respond quickly to risk with new technologies, while delivering enhanced integrated warfighting capability at lower cost. The DoD faces several challenges in achieving these goals. Chief among them is addressing the decades-long, stove-piped, ad hoc approach to developing software that results in vendor-locked legacy systems, each of which maintains its own proprietary software, computers, networks, and operating systems. A promising solution is OSA, which combines
technical practices designed to reduce the cycle time needed to acquire new systems and insert new technology into legacy systems and
business models for creating a more competitive marketplace and a more effective strategy for managing intellectual property rights in DoD acquisition programs.
The SEI is helping the DoD craft its OSA strategy and an implementation plan to deliver better capabilities to warfighters withinthe fiscal constraints of sequestration. A working group has been established to help the DoD move away from stove-piped software development models to Common Operating Platform Environments (COPEs) that embody OSA practices. As part of this effort, I am involved with a task area on "published open interfaces and standards" that aims to help program managers and other acquisition professionals avoid vendor lock-in, encourage competition, and spur innovation by defining a limited number of technical reference frameworks that breakdown traditional stove-piped solutions. These frameworks are integrated sets of competition-driven, modular components that provide reusable combat system architectures for families of related warfighting systems.
Despite substantial advances in technical reference frameworks during the past decade, widespread adoption of affordable and dependable OSA-based solutions has remained elusive. It is therefore important to look at past open-systems efforts across the DoD to understand what worked, what hasn’t, and what can be done to make it more successful this time. To achieve this historical perspective, I—along with fellow SEI researcher Don Firesmith and Adam Porter from the University of Maryland Department of Computer Science—have been documenting the evolution of DoD combat systems with respect to their adoption of systematic reuse and the OSA paradigm described above, as shown in the following diagram.
To view a larger image of the diagram, please click on the image above
The ad hoc architectures in the columns on the left are highly stove-piped, course-grained, and exhibit little or no shared capabilities that are critical to warfighter, including communications, radars, launchers, etc. The increasingly advanced architectures from left to right are intentionally designed to share more capabilities at finer levels of granularity in DoD systems, including
Infrastructure capabilities, such as internetworking protocols, operating systems, and various layers of middleware services, such as identity managers, event disseminators, resource allocators, and deployment planners, and
Common data and domain capabilities, such as trackers, interoperable data models, and mission planners involving battle management (BM), control, and interaction with sensors and weapons in C4ISR systems, and
External interfaces, such as across the global information grid (GIG) to external weapon systems, as well as information sources and users.
In practice, of course, production combat systems vary in terms of their progression along the continuum shown in the figure and descriptions above. This discussion is intended to provide a birds-eye view of the design space of DoD combat systems with respect to architectural evolution. The remainder of this posting describes the first four epochs in the diagram shown above. The remaining four epochs will be described in the next blog post in this series.
Ad hoc Architectures involve the separate development of each warfighter’s capability (such as BM/C4I, sensors, weapons etc. ) in a vertically stove-piped manner that lacks crisply defined module boundaries. This approach is characterized by vertical integration and tight coupling from higher-level domain-specific capabilities down to hardware and system-level infrastructure capabilities.
Ad hoc architectures are widely used in DoD legacy combat systems for various reasons. For example, the tight coupling between system components has historically been deemed essential for mission- and safety-critical DoD programs that need to extract maximum performance to meet stringent end-to-end quality attributes. The stove-piped nature of these ad hoc architectures has also often been perceived as risk prudent, since these architectures enable a single program office and system integrator to maintain tight control over every facet in the solution.
Despite their pervasive use historically, however, ad hoc architectures have become prohibitively expensive to develop and sustain over the software and system lifecycle. A key problem is the tight coupling common in ad hoc architectures, which typically locks the DoD into sole-source contracts that limit the benefits of open competition and impede innovations. These innovations include the ability to leverage commodity hardware and/or software platform advances, such as multi-core and distributed-core cloud computing environments, that would otherwise occur during periodic technical refresh insertion points.
Modular Architectures define some crisp boundaries within their stove pipes and began to transition away from top-down algorithmic decomposition to a more object-oriented and component-based decomposition. This approach is characterized by designs whose components are less externally coupled and more internally cohesive than the earlier generation of ad hoc architectures.Although ad hoc architectures have not been economically viable for many years, sequestration has renewed the interest of government and defense industry leadership in modular architectures. Ironically, interest in modular architectures for DoD combat systems began several decades ago, as acquisition programs began to define module boundaries more crisply within their stove-pipes to move away from top-down, algorithmic decomposition (which yields tightly coupled point solutions) to a more object-oriented decomposition (which emphasizes modular, loosely coupled components that can be understood and tested more readily in isolation and thus reused more effectively).In the early phases of DoD software development, developers tended to write software using function-oriented programming languages (such as FORTRAN, JOVIAL, and C) and algorithmic decomposition and structured design methods, which focus on optimizing computing performance. In the 1990s, developers began to adopt object-oriented programming languages (such as C++, Java, and Ada95) and object-oriented design methods, which focus more on optimizing developer productivity. This shift occurred, in part, due to advances in hardware and software technologies (such as faster processors and networks, larger storage capabilities, and better compilers). It also coincided with the defense cutbacks stemming from the breakup of the U.S.S.R. (the so-called "Peace Dividend") and the end of the first Gulf War, which motivated the defense industry to rethink the economics of their development models.For example, as component providers and system integrators recognized they couldn’t pass along the costs of these complex systems to their government customers, they begin modularizing their stove-pipes. They deemed the myriad dependencies and accidental complexity of their traditional ad hoc architectures too costly in terms of development and sustainment effort. A drawback of the first generation of modular architectures, however, was that they were still largely stove-piped and lacked the ability to share components across different warfighting capabilities. For instance, a software module could not be initially targeted for a radar system and then subsequently reused in a launcher system.
Modular Open Systems Architecture with Standard Key Interfaces (MOSA) stemmed from a well-defined public standard approach that was both a business and technical strategy for developing new systems or modernizing existing ones. This approach was characterized by designing systems with modular interfaces, designated key interfaces, and select open standards with the goal of providing acquisition programs a choice of vendors when a system needs to be updated.With the advent of the modular architecture approach described above, the DoD began to reap some benefits of module reuse, including easier testing and porting to new environments. The drawback of this approach, however, was that each module was still largely proprietary. While the end result was a more efficient architecture, it modules were too tightly coupled, which increased sustainment costs and encouraged vendor lock-in.To help overcome these limitations with earlier modularity approaches, MOSA was devised to make it easier for DoD acquisition programs to replace modules from one architecture with modules from another. The resulting architecture provided acquisition programs with a wider choice of vendors when a system underwent upgrades since developers could create a new module with the same interface as the one being replaced. The key difference between MOSA and earlier modularity approaches was that module were connected via standardized and openly published interfaces and integration models.
Layered Architectures emerged as commercial off-the-shelf (COTS) software began to mature and DoD acquisition programs began to purchase them directly from vendors and use them to layer systems so that they were no longer entirely built by a single integrator, even in a modular way. This approach was characterized by a horizontal partitioning of a system’s functionality according to a (sub) system-wide property, such that each group of functionality is clearly encapsulated and can evolve independently. The specific partitioning criteria can be defined along various dimensions, such as abstraction, granularity, competitive market size, hardware encapsulation, and rate of change.During the mid-1990s, as the MOSA approach was growing in popularity, the DoD also began to reconsider its stance on COTS hardware and software technologies, such as CPUs, storage devices, networking elements, programming languages, and operating systems. Prior to this point, the DoD had considered COTS to be incompatible in terms of safety, maturity, and dependability for mission-critical combat systems. The constraints and demands of the DoD environment had instead fostered a system in which contractors were building both vertically integrated systems and the underlying system infrastructure, such as programming languages, compilers, operating systems, networking protocols, and networking standards.As COTS technologies began to mature, however, DoD programs began to purchase them directly from vendors and use them to layer certain portions of their systems, particularly domain-independent infrastructure capabilities layer(s). Examples include COTS products based on open standards such as TCP/IP, POSIX, CORBA, DDS, and Web Services. Consequently, these infrastructure capabilities were no longer built by integrators, even in a modular way. One benefit of layered architectures is that, because of industry competition, DoD programs were using technologies that were much more current than those they were able to obtain through traditional, stove-piped systems. Likewise, commercial industry tends compete and innovate more rapidly than traditional defense contractors due to leveraged funding from a range of customers, including DoD, government, and enterprise/consumer users.
Wrapping Up and Looking Ahead
Over the past several decades the advances in DoD combat system architectures presented above have had several beneficial effects. For example, modularity has helped integrators increase the flexibility of their proprietary solutions. Likewise, layering has increased the adoption of domain-independent COTS and open-standards infrastructure as the basis for many DoD combat systems. While these advances are a step in the right direction, they have not yet significantly reduced the development and sustainment costs of DoD combat systems. One reason for this limited impact on lifecycle costs is that these earlier architecture advances did not address key business model drivers, but instead focused on standardized infrastructures and codified architectures, which account for a relatively small portion of the total ownership costs of combat systems.
The next post in this series will describe the other four epochs of the architectural evolution of the DoD combat system shown in the diagram above. These epochs focus more on domain-specific architectural layers that address business and economic issues, as well as technical concerns. Subsequent posts in this series will explore a research effort to help one Navy program obtain accurate estimates of the cost savings and return on investment for both the development and lifecycle of several product lines built using a common technical reference framework.
Additional Resources
To read the SEI technical report, A Framework for Evaluating Common Operating Environments: Piloting, Lessons Learned, and Opportunities, by Cecilia Albert and Steve Rosemergy, please visit http://www.sei.cmu.edu/library/abstracts/reports/10sr025.cfm
To read the SEI technical note, Isolating Patterns of Failure in Department of Defense Acquisition, by Lisa Brownsword, Cecilia Albert, David Carney, Patrick Place, Charles (Bud) Hammons, and John Hudak, please visithttp://www.sei.cmu.edu/library/abstracts/reports/13tn014.cfm
SEI
.
Blog
.
<span class='date ' tip=''><i class='icon-time'></i> Jul 27, 2015 02:18pm</span>
|
|
By Julien Delange Member of the Technical Staff Software Solutions Division
Safety-critical avionics, aerospace, medical, and automotive systems are becoming increasingly reliant on software. Malfunctions in these systems can have significant consequences including mission failure and loss of life. So, they must be designed, verified, and validated carefully to ensure that they comply with system specifications and requirements and are error free. In the automotive domain, for example, cars contain many electronic control units (ECU)—today’s standard vehicle can contain up to 30 ECUs—that communicate to control systems such as airbag deployment, anti-lock brakes, and power steering. The design of tightly-coupled software components distributed across so many nodes may introduce problems, such as early or late data delivery, loss of operation, or concurrent control of the same resource. In addition, errors introduced during the software design phase, such as mismatched timing requirements and values beyond boundaries, are propagated in the implementation and may not be caught by testing efforts. If these problems escape detection during testing, they can lead to serious errors and injuries, as evidenced by recent news reports about problems with automotive firmware. Such issues are not specific to a particular domain and are very common in safety-critical systems. In fact, such problems are often found when reviewing code from legacy systems designed and built more than 20 years ago and still operating, as in the avionics and aerospace domains. This blog post describes an effort at the SEI that aims to help engineers use time-proven architecture patterns (such as the publish-subscribe pattern or correct use of shared resources) and validate their correct application.
Architecture Design and Analysis: Why it Matters
Today's safety-critical systems are increasingly reliant on software. Software architecture is an important asset that impacts the overall development process: for example, good software architecture eases system upgrade and reuse while bad architectures can lead to unexpected rework when trying to modify a component. This trend will continue, especially because software size continues to grow at a significant rate and the early—and intentional—design of software architecture is an important tool in managing this complexity. Software architecture also helps system stakeholders reason about the system in its operational environment and detect potential flaws.
Beyond these benefits, the early design and review of a software architecture can help avoid common software traps and pitfalls prior to implementation. A study by the National Institute of Standards and Technology found that 70 percent of software defects are introduced during the requirements and architecture design phases. What exacerbates the problem is the fact that 80 percent of those defects are not discovered until system integration testing or even later in the development lifecycle. Fixing these issues later has an adverse impact on product delivery schedule and also on development costs, In their paper "Software Reduction Top 10 List" software engineering researchers Barry Boehm and Victor Basili wrote that "finding and fixing a software problem is 100 times more expensive than finding and fixing it during the requirements and design phase."
A group of SEI researchers have started an effort that details strategies for avoiding software architecture mistakes by using appropriate architecture patterns (such as the ones from the NASA reports) and validating their correct application. Specifically, we are working on tools to analyze software architecture, detect pattern usage, and check that system characteristics cannot undermine the benefits of the pattern. This approach promotes use of well-known methods to improve software quality, such as decoupling functions or reducing variable scope to make the software more modular. In the long term, such methods can help designers avoid common architecture traps and pitfalls from the beginning as well as the incurrence of potential rework later in the development process.
From a practical perspective, this approach makes use of the Architecture Analysis and Design Language (AADL) for specifying an architecture pattern. We implemented a new analysis function in the Open Source AADL Tool Environment (OSATE) to validate correct use of the pattern and analyze pattern consistency with the other components. In particular, such a tool can detect any characteristic from the system environment that might impact use of the pattern. For example, in the case of the publish-subscribe pattern (a component sending data periodically to a receiver), one common mistake is a mismatch between the execution frequency of the publisher and subscriber, such as when the publisher sends data faster than the subscriber can handle it. Our validation tool analyzes the application of such a pattern and checks for timing mismatch, ensuring that the subscriber has enough time and resources to receive and handle all incoming data.
Using and Validating Architecture Patterns
The publish-subscribe pattern introduced above can be illustrated by a simplified weather station with two components: a temperature sensor (publisher) that periodically sends a value (temperature) to a monitor (subscriber) that computes statistics about the value including maximum, minimum, and average. Each component (the sensor and the monitor) is periodic: each executes at a fixed and predefined rate (for example, each second). Figure 1 illustrates the publish-subscribe pattern. As shown in this figure the communication uses a connection between two components. When the sensor publishes data it is stored in a buffer to make it available to the monitor that subscribes to the data. As both tasks are running at the same rate (1 second(s)), no data is lost or read twice.
Figure 1. The publish-subscribe pattern without queued communication
Changing the components’ characteristics may have important side-effects. For example, changing the execution rate of the sensor so that it is executed more frequently than the monitor causes data loss. The second execution of the sensor will overwrite the actual buffer on the monitor and replace the previous unread value. The consequences will be that some values are not processed by the monitor and that the result (minimum, maximum, and average temperature) is not accurate.
A common workaround for this issue uses communication queues that can store several values. In our current example, we change the buffer dimension of the monitor so that it can handle two pieces of data. We illustrate such an architectural change in Figure 2.
Figure 2 - The publish-subscribe pattern with queued communication
In this case, the sensor is executed faster (500 milliseconds (ms)) than the monitor (1s). No data is lost because the monitor can contain two data values and read all of them when it is executed. A new problem may appear, however, if the buffer size or the execution period is modified.
This type of issue may not be important to your system, and checking the correct application of the pattern depends on your system requirements. If the data being exchanged is of any particular importance, however, you must check that the pattern is applied correctly in the architecture. In this example (the publish-subscribe pattern), validating the correct application of the pattern requires that:
without queued communication, the monitor is executed faster than the sensor
with queued communication, the components’ periods and queue size are configured consistently to avoid data loss
Timing and resource-dimension issues are among many in a 2011 NASA report that identifies issues related to an unintended acceleration problem in automotive software. The report states that software analysis tools detected more than 900 buffer overruns when the tools were used to analyze the automotive software that was experiencing the problem. The use of software architecture ensures that these types of issues can be detected and avoided during system design and not propagated to subsequent development stages.
For that reason, it is important to not only make use of good architecture patterns, but to analyze an architecture to ensure correct pattern application and use. For our publish-subscribe example, we describe the architecture using AADL. Our validation tool checks its correctness by analyzing the components’ characteristics. The following figures show our validation framework, with the left part illustrating the validation of a correct architecture and the right showing an error, highlighting a software architecture defect (inconsistent timing properties).
To view a larger image of this figure, please click here.
Use of architecture validation tool, by validating a correct architecture (left) or detecting inconsistent use of an architecture pattern (right).
The Take Away
Recent news reports illustrate the value of architecture analysis for improving software development, reducing potential rework costs, and avoiding delivery delays. In that context, SEI researchers are promoting the use of software architecture patterns in conjunction with analysis tools to check their correct application and thus, avoid typical architecture design trap and pitfalls.
Our analysis tools look for architecture defects using validation rules, such as:
Variable Scope. Variable scope defines what entities might read or write in a variable. An improperly defined variable scope limits software reuse (too many components depend on a shared global variable) or limits analysis by making it hard to trace what tasks read from or write to the variable. To avoid such defects, architects must analyze software architecture and check if variables are declared and used at the appropriate scope. From a technical perspective, our validation tool checks whether variables are declared with the appropriate scope according to their use (tasks or subprograms that accesses it) and advocate architecture changes when appropriate. Such an approach would avoid unnecessary use of global variables, which is usually a design mistake, as evidenced by a recent report from the National Highway Transportation Safety Administration on unintended acceleration in Toyota vehicles. The same report illustrates that this is a common trend and states that some automotive software can contain more than 2,200 global variable declarations with different types. Concurrency. Many software architectures include tasks that access shared resources (such as services, resources, data, etc.). A common mistake is to share data among several components that read and write new values without controlling concurrent access, which can lead to potential consistency issues. To overcome this problem, we advise using a concurrency control mechanism (such as semaphore or mutex) to avoid value inconsistencies and related race conditions. On the other hand, if only one task writes to the data, the concurrency mechanism might be avoided. Inappropriate use of multi-tasking features and locking mechanisms is the source of many software issues, as evidenced by the Flight Software Complexity Report issued by NASA. Using the appropriate mechanism is important in the context of safety-critical systems, as they may have limited resources, and use of such mechanisms introduces potentially unnecessary overhead. Examples of rules to check correct use of shared resources include:
If more than two tasks write into shared data, the data must be associated with a locking mechanism (mutex, semaphore, etc.).
If only one task writes into shared data, no locking mechanism is mandatory.
We are working on several validation rules for analyzing the use of global variables and refactoring the software architecture so that
software is decomposed into modules that can be reused and deployed on separate processing nodes
variable assignment and modification are restricted to a limited scope (so that a variable cannot be modified anywhere.)
data flow is clearly defined and bounded to a specific scope
An outline of this effort, and our progress in developing this approach, is available online. All the validation technology is included in OSATE, our Eclipse-based AADL modeling framework under a free license. We invite you to use and test our approach, and then send us feedback.
To improve existing patterns and add new ones, we also plan to interview safety-critical system engineers and designers so that we may adapt our work to existing industrial issues, expectations, and needs. If you are a software engineer or designer who would be interested in participating, please send an email to info@sei.cmu.edu.
Additional Resources
To read more about the approach that we are developing, please visithttps://wiki.sei.cmu.edu/aadl/index.php/Good_Software_Architecture_Practices_with_AADL
To read the NASA Study on Flight Software Complexity, please visit http://www.nasa.gov/offices/oce/documents/FSWC_study.html
To read the National Highway Transportation Safety Administration Study of Unintended Acceleration in Toyota Vehicles, please visit http://www.nhtsa.gov/UA
SEI
.
Blog
.
<span class='date ' tip=''><i class='icon-time'></i> Jul 27, 2015 02:18pm</span>
|
|
By Robert Nord, Senior Member of the Technical StaffSoftware Solutions Division
(This blog post was co-authored by Ipek Ozkaya)
As the pace of software delivery increases, organizations need guidance on how to deliver high-quality software rapidly, while simultaneously meeting demands related to time-to-market, cost, productivity, and quality. In practice, demands for adding new features or fixing defects often take priority. However, when software developers are guided solely by project management measures, such as progress on requirements and defect counts, they ignore the impact of architectural dependencies, which can impede the progress of a project if not properly managed. In previous posts on this blog, my colleague Ipek Ozkaya and I have focused on architectural technical debt, which refers to the rework and degraded quality resulting from overly hasty delivery of software capabilities to users. This blog post describes a first step towards an approach we developed that aims to use qualitative architectural measures to better inform quantitative code quality metrics.
Technical debt is an increasingly critical aspect of producing cost-effective, timely, and high-quality software products. Recently, our research has focused on going beyond debt as a metaphor to investigating which measures a software development team can apply to effectively monitor changing qualities of software. These measures can take advantage of code quality if the goal is to optimize development qualities.
Existing code measures alone, however, do not provide insight into overall architectural improvements due to the increasing complexity and context dependencies of software systems. We must investigate a range of measures to provide a multi-view architectural perspective of design time, run-time, and deployment time qualities. The goal of our research is to provide an architectural measurement framework that can assist in monitoring and improving high architectural risk areas of a system.
Informing Quantitative Metrics with Qualitative Measures
Developers can apply off-the-shelf tools (such as Lattix, SonarGraph, SonarQube, Structure101) to understand architectural dependencies for change impact or rework analysis that rely on code metrics (such as stability, coupling, cohesion, cyclicity, complexity, etc.). These metrics are often helpful in improving code quality and can provide structural information about architectural dependencies and modifiability. Recent research has demonstrated, however, that such metrics fall short of providing overall architectural system guidance when used as they are. The question we ask is whether the relevance and use of such metrics can be improved to provide architectural guidance, as well.
To address this question, we developed an approach to contextualize and focus the application of dependency analysis and architecture-relevant code quality and system modifiability metrics using architecture evaluations. Scenario-based architecture analysis offers a broad understanding of how a software-reliant system evolves over time and can form a basis for assessing the amount of rework that may be necessary in the foreseeable future. Using the architectural risks identified during scenario-based architecture analysis, we clarified the level of system decomposition where code quality metrics reveal relevant information.
As outlined in our January 2012 blog post, An Architecture Focused Measurement Framework for Managing Technical Debt, our research on this topic is informed by real-world examples gathered from technical debt workshops. That blog post, authored by my colleague and co-author, Ipek Ozkaya, noted that an architecture-focused analysis approach helps manage technical debt by enabling software engineers to decide the best time to re-architect, thereby reducing the technical debt.
CONNECT
Our earlier work is part of an ongoing SEI research agenda to improve the integration of architecture practices within agile software development methods. To test our approach, we evaluated CONNECT, which is an open-source software system to exchange patient information among different healthcare providers at the local and national level.
The developers of CONNECT used Scrum as their agile project management approach. They hold biweekly sprints (117 as of late 2013) and periodically release updates on the system (typically every quarter.) Code development is outsourced and takes place in different offices. Periodic code sprints bring all the developers together to synchronize their work.
We chose to focus a portion of our research in the context of CONNECT since a team of SEI researchers was asked in November 2011 to complete an evaluation of the system that focused on quality attribute goals using the Architecture Tradeoff Analysis Method (ATAM). This analysis yielded a list of potential risks that the project needed to address. Among the risks cited, researchers referenced Adapter/Gateway separation:
CONNECT initially separates the handling of messages from the integration with other systems. However, these roles have become confused over time, and it is not clear how the roles should be separated.
The remainder of this post focuses on this risk theme that the ATAM identified as an area of major concern.
While the ATAM provided a list of risks the project needed to address, we considered it to be a point-in-time representation of the project. A common response is to focus on short-term fixes at the expense of underlying causes for those risks, which are typically architectural in nature. In our examination of CONNECT, we tried to understand how architectural scenarios, developed during the architectural evaluation, impacted project narrative in the next major release, which included changes implemented as a result of the ATAM.
In the case of CONNECT, we examined the JIRA issue tracker, which contained sprint and product backlogs. We looked at the feature requests and improvements in the backlog both before and after the ATAM and saw that it had an impact on what the development team worked on as there was a 22 percent increase in risk-related issues created after the ATAM was conducted. The CONNECT system underwent a significant re-architecting effort to reduce its dependency between the Adapter and the Gateway.
Next, we sought to reconcile the risk themes identified in the ATAM with the dependencies extracted from the code and the automatically generated code quality measures. To analyze whether common modularity code metrics reflect the impact of changes related to the architectural risks and recommendations, we compared the baseline version of CONNECT with the next version released after the developers re-architected the Adapter/Gateway dependency to mitigate the risk. We analyzed the code at three levels of decomposition in the code hierarchy:
system-level decomposition. We analyzed the code of the entire CONNECT system, including middleware and all of its third-party dependencies. Overall, the modifiability metrics show improvements in atom count, internal dependencies, average impact, system stability, connectedness, connected strength, coupling, coupling strength, and system cyclicity.
software-level decomposition. We analyzed the code from gov.hhs including packages, interfaces, and classes for CONNECT middleware, with the exception of third-party libraries. We found that the metrics demonstrate that the numbers of atoms and internal dependencies decreased, indicating that the package contained fewer code artifacts. System cyclicity and connectedness increased, indicating a higher likelihood of change. Given that system stability measured 98 percent, the system appears to be highly stable regardless of the architecture change at the package level of decomposition. For this case, where the stability metric did not change, a lower connectedness strength in the next release suggests the design would be architecturally less sensitive to change.
package-level decomposition. At this lowest level of decomposition, we focused on the document query package gov.hhs.fha.nhinc.docquery, which reflected the architectural changes based on Adapter-Gateway integration risk. At this level of decomposition, the system size is significantly reduced as is the average impact. This metric indicates that changes to the document query system will be less likely to resonate throughout the rest of the system. While the system stability level at this metric improved, we noted a decline in connectedness and connected strength.
The stability metric, which provides a system’s overall sensitivity to changes, reliably reflected the system’s architecture when appropriately focused on the CONNECT system’s selected decomposition level, and excluding the rest of the system. Navigating down the hierarchy in this manner allowed us to see where the code metrics show any significant change, thereby indicating whether the architecture has improved or deteriorated during its evolution. Spotting areas of rework requires understanding the context provided by architecture and quality concerns that influence architectural evolution. When focused on the problematic elements of the system, at a suitable level of decomposition in the package structure, selected code-based metrics for assessing ripple effects reflect the improvements on the architecture where rework has been done.
When we included the entire system in our stability assessment, however, the system stability measurement reported that it was close to 100 percent, given the large number of elements that falsely indicated a stable system. We found that applying existing metrics consistently requires the ability to choose appropriate elements of the system at a suitable level of decomposition because dependency analysis is quite sensitive to the size of the graph and its context, as our results demonstrated.
Looking Ahead
Our research aims to bring the architecture analysis and developer environments closer together. Our goal is to create repeatable analysis and validation on metrics that provide architectural information and scale to systems of realistic size. On another front, we have joined forces with members of the architecture and metrics community to host a workshop on software architecture metrics. Our aim is to improve measurement techniques for architecture that yield reliable, consistent, and repeatable results by
discussing progress on architecture metrics, measurement, and analysis
gathering empirical evidence on the use and effectiveness of metrics
identifying priorities for future research
The workshop, which will be held in April 2014, will bring together a cross-section of experts in academia and industry in the areas of dependency analysis, architecture metrics, analysis and evaluation, software analytics, empirical software engineering and measurement.
Additional Resources
For more information about the First International Workshop on Software Architecture Metrics, which will be held April 7, 2014 in conjunction with the Working IEEE/IFIP Conference on Software Architecture (WICSA), or to submit a paper, please visit www.sei.cmu.edu/community/sam2014.
SEI
.
Blog
.
<span class='date ' tip=''><i class='icon-time'></i> Jul 27, 2015 02:18pm</span>
|
|
By Will DormannSenior Member of the Technical StaffCERT Vulnerability Analysis Team
Occasionally this blog will highlight different posts from the SEI blogosphere. Today we are highlighting a recent post by Will Dormann, a senior member of the technical staff in the SEI’s CERT Division, from the CERT/CC Blog. In this post, Dormann describes how to modify the CERT Failure Observation Engine (FOE),when he encounters apps that "don’t play well" with the FOE. The FOE is a software testing tool that finds defects in applications running on the Windows platform.
SEI
.
Blog
.
<span class='date ' tip=''><i class='icon-time'></i> Jul 27, 2015 02:18pm</span>
|



