Training Magazine Network

Blogs

By Linda Parker GatesSenior Member of the Technical StaffAcquisition Support Program The appeal of Agile or lightweight development methods has grown steadily in the software development community. Having spent a number of years investigating strategic planning approaches, I’ve recently been thinking about whether Agile principles can be—and should be—applied to strategic planning. This blog post examines the applicability of Agile principles to strategic planning. Strategic planning is the process of defining an organization’s plans for achieving its mission. The purpose is to outline a broad approach to achieving mission-aligned goals derived through a process of analyzing the full organizational environment, taking into account the organization’s vision, goals, objectives, enablers, barriers, and values. Although descriptions and analysis of the present situation are included, a strategic plan doesn’t merely endorse the status quo; it is directional in nature and directs change of some kind. As such, strategic planning is a critical foundation for executing work. It sets the stage for division- and unit-level planning, as well as enterprise architecture, process improvement, risk management, portfolio management, and any other enterprise-wide initiatives. Strategic planning typically follows this type of sequence: Scope the strategic planning effort. Build a foundation of organizational information. Define goals and objectives in terms of the organizational need and desired outcome. Identify potential strategies for achieving the objectives. Develop action plans. Identify project measures. Execute the work. Track progress. My February 2011 blog post, Strategic Planning with Critical Success Factors and Future Scenarios, proposes the integration of the Critical Success Factor (CSF) Method and future scenario planning into the strategic planning process. CSFs are a group of factors that determine group or company success, including key jobs that must be done exceedingly well. Future scenarios are a tool for exploring multiple, possible "futures" and developing decisions or strategies that will serve the uncertain future well. Together, these two techniques help foster strategic thinking, which is a strong complement to strategic planning. They also provide some leverage points for making strategic planning more nimble. The values presented in the Agile Manifesto focus on the following four principles: individuals and interactions over processes and tools working software over comprehensive documentation customer collaboration over contract negotiation responding to change over following a plan The manifesto is careful to point out that a balance is required; that is, the counter values cannot be ignored. So a key question becomes: Can organizations benefit from pursuing strategic planning approaches that embody these Agile values? The rest of this blog posting addresses this question. Individuals and Interactions over Processes and ToolsA strong process is vital to effective strategic planning. Nonetheless, placing value on interactions between leadership and staff and on face-to-face conversations certainly enhances the value of the strategic planning process. Both the CSF method and scenario planning rely heavily on individuals and interactions. These conversations themselves bring value to strategic planning. Working Software/Tangible Results over Comprehensive DocumentationA common criticism of strategic planning is that it over-emphasizes deconstruction of the past and present while creating the illusion that we can anticipate the future. If we replace "working software" with "tangible results" (to accommodate the strategic planning domain), I contend it is productive to focus strategy efforts on short-term accomplishments over thoroughly documented commitments about the future. Customer Collaboration over Contract NegotiationWhile contract negotiation is not particularly pertinent to strategic planning, a strategic plan serves as an informal contract with the organization. The idea of involving customers is also intriguing. In my technical report, Strategic Planning with Critical Success Factors and Future Scenarios, published in November 2010, I recommend involving customers in scenario planning since they bring an external perspective that can be critical to getting quality results. There may even be a greater role for customers in the development of strategy. Responding to Change over Following a PlanThis principle offers great potential for improving the way strategic planning is conducted and the results realized in implementation. A good strategic-planning process has always done more than just produce a plan—it supports ongoing strategic thinking, discussion, and behavior. Strategic thinking focuses on finding and developing organizational opportunities and creating dialogue about the organization’s direction—these are the foundation of an organization’s readiness to respond to change. Strategic planning is enhanced by strategic thinking, which makes planning adaptive and in sync with an evolving environment. So is the planning activity itself still needed? Yes! Effective strategic work cannot be accomplished without it. If nothing else, the divergent results of strategic thinking must be operationalized through a convergent planning activity. So, the next question is, how do we adjust the strategic planning process to make it more agile? The most effective adjustments to make are to apply the Agile values to steps 3 and 8 of the strategic planning process outlined above. Step 3 (Define goals and objectives) benefits from a focus on the first value (individuals and interactions) and the third value (customer collaboration). Step 8 (Track progress) is enhanced through a focus on the second value (which I am calling "tangible results") and the fourth value (responding to change). Interestingly, the CSF and scenario planning methods provide opportunities to integrate all four Agile values into a strategic planning process as follows: In terms of individuals and interactions, CSFs are derived through interviews with managers, a critical aspect of the technique that involves one-on-one conversations with people very closely acquainted with operational issues. Scenario planning is a team-based exercise that relies heavily on interactions among a representative cross-section of organizational staff. The interview-based method for developing scenarios that Kees van der Heijden describes in his 1996 book, Scenarios: The Art of Strategic Conversation, can enhance the role of individuals in setting strategy. As noted above, scenario planning provides a good opportunity for customer collaboration. Scenario planning relies heavily on monitoring for early warning signs, which are indicators that a particular future is unfolding. These indicators help planners make adjustments to strategies or their execution. Combined with shorter cycles, monitoring techniques are critical for delivering short-term, tangible results and responding to change. It is important to understand the nature of planning. The fourth agile value emphasizes responding to change over following a plan. Without a strong plan, response to change is simply reaction, not agility. The Agile principles assert values in terms of preferences over the counter values. In the strategic planning domain, the ability to perform in accordance with Agile values requires significant strength in the counter values. Agility emerges from skill and strength in the counter values. It doesn’t replace them. Agile strategic planning would best serve an organization that is applying agile methods. If development teams are already using lightweight methods, leadership should consider adopting agile processes to move the organization toward its goals. In general, agile strategic planning can offer value to organizations that are complex or self-organizing and that focus on adaptive, iterative delivery. Additional Resources: To read or download a copy of the SEI technical report, Strategic Planning with Critical Success Factors and Future Scenarios: An Integrated Strategic Planning Framework, please visitwww.sei.cmu.edu/library/abstracts/reports/10tr037.cfm To read the blog post, Strategic Planning with Critical Success Factors, please visit http://blog.sei.cmu.edu/post.cfm/strategic-planning-with-critical-success-factors-and-future-scenarios

SEI . Blog .  Jul 27, 2015 02:48pm

Real-Time Scheduling on Heterogenous Multicore Processors

By Bjorn Andersson, Senior Member of the Technical StaffResearch, Technology & System Solutions Many DoD computing systems—particularly cyber-physical systems—are subject to stringent size, weight, and power requirements. The quantity of sensor readings and functionalities is also increasing, and their associated processing must fulfill real-time requirements. This situation motivates the need for computers with greater processing capacity. For example, to fulfill the requirements of nano-sized unmanned aerial vehicles (UAVs), developers must choose a computer platform that offers significant processing capacity and use its processing resources to meet its needs for autonomous surveillance missions. This blog post discusses these issues and highlights our research that addresses them. To choose a computer platform that offers greater capacity, it is necessary to observe the major trends among chip makers. Historically, advances in semiconductor miniaturization (a.k.a., Moore's Law) periodically yielded microprocessors with significantly greater clock speeds. Unfortunately, microprocessor serial processing speed is reaching a physical limit due to excessive power consumption. As a result, semiconductor manufacturers are now producing chips without increasing the clock speed, but instead are increasing the number of processor cores on a chip, which results in multicore processors. For nearly a decade, the use of homogeneous multicore processors (which are chips with identical processing cores) gave us some headroom in terms of power consumption and allowed us to enjoy greater computing capacity. This headroom is diminishing, unfortunately, and is about to vanish, forcing semiconductor manufacturers to seek new solutions. We are currently witnessing a shift among semiconductor manufacturers from homogeneous multicore processors with identical processor cores to heterogeneous multicore processors. The impetus for this shift is that processor cores tailored for a specific class of applications behavior have the potential to offer much better power-efficiency. AMD Fusion and NVIDIA Tegra 3 are examples of this shift. Intel Sandybridge, which has a graphics processor integrated onto the same chip as the normal processor, also reflects this shift, as well. In a heterogeneous multicore environment, the execution time of a software task depends on which processor core it executes on. For example, a software task performing computer graphics rendering, simulating physics, or estimating trajectories of flying objects runs much faster on a graphics processor than on a normal processor. Conversely, some software tasks are inherently sequential and cannot benefit from the graphics processor; they execute much faster on a normal processor. For example, a software task with many branches and no inherent parallelism runs much faster on a normal processor than on a graphics processor. Ideally, each task would be assigned to the processor where it executes with the greatest speed, but unfortunately the workload is often not perfectly balanced to the types of processor cores available. Efficient use of processing capacity in the new generation of microprocessors therefore requires that tasks are assigned to processors intelligently. In this context, "intelligently" means that the resources requested by the program are the ones possessed by the processor. Moreover, the desire for short design cycles, rapid fielding, and upgrades necessitates that task assignment be done automatically—with algorithms and associated tools. THE TASK ASSIGNMENT PROBLEM The problem of assigning tasks to processors can be described as follows: A task (such as computer graphics rendering or a program determining whether the process half-or-triple-plus-one reaches one with a known starting value) is described with its processor utilization, but it has different processor utilizations for different processors. For example, if a given task is assigned to a graphics processor, then the task will have a utilization of 10 percent. If the task is assigned to a normal processor, the task will have a utilization of 70 percent. We are interested in assigning each task to exactly one processor such that for each processor, the sum of utilization of all tasks assigned to this processor will not exceed 100 percent. If we can find such an assignment then it is known that if tasks have deadlines described with the model implicit-deadline sporadic tasks—and if the scheduling algorithm Earliest-Deadline-First (EDF) is used—then all deadlines will be met at run-time (with a minor modification, we can use Rate-Monotonic scheduling as well). PREVIOUS APPROACHES FOR TASK ASSIGNMENT The task assignment problem belongs to a class of problems that are computationally intractable, meaning that it is highly unlikely to be possible to design an algorithm that finds a good assignment and always runs fast. So we should either create an algorithm that always finds a good assignment or one that always runs fast. For the goal of designing an algorithm that always finds a good assignment, task assignment can be modeled as integer-linear programming (ILP) as follows: Minimize zsubject to the constraints that for each processor p: x1,p*u1,p + x2,p*u2,p + … + xn,p*un,p <= zand for each task i: xi,1 + xi,2 + … + xi,m = 1and for each pair (i,p) of task i and processor p: xi,p is either 0 or 1 In the optimization problem above, n is the number of tasks and m is the number of processors and ui,p is the utilization of task i if it would be assigned to processor p. xi,p is a decision variable with the interpretation that it is one if task i is assigned to processor p; zero otherwise. Unfortunately, solving this integer linear program takes a long time. To design an algorithm that always runs reasonably fast, there are several algorithms, as described in a research paper by Sanjoy K. Baruha, that transform the ILP into a linear program (LP) and then perform certain tricks. Although an LP runs faster than the ones based on ILP, they still have to solve an optimization problem, which can be time-consuming. To design algorithms that run faster, we would like to perform task assignment in a way that does not require solving LP. OUR APPROACH FOR TASK ASSIGNMENT Previous work on task assignment for homogeneous multicore processors where all processor cores are identical is based on a framework called bin-packing heuristics. Such algorithms work approximately as follows: 1.Sort tasks according to some criterion. 2.for each task do 3.for each processor do 4.if the task has not yet been assigned and it is possible to assign the task to the processor so that the sum of utilization of tasks on the processor does not exceed 100 percent then 5.Assign the task on the processor 6.end if 7.end for 8.end for Our approach involves adapting bin-packing heuristics to heterogeneous multicore processors. We believe it is possible to modify the algorithm structure outlined above so we can also assign tasks to processors even when the utilization of a task depends on the processor to which it is assigned. One can show that the use of bin-packing can perform poorly if processors and tasks are not considered in any particular order. Specifically, for a set of tasks that could be assigned, such an approach can fail even when given processors that are "infinitely" faster. One of our main research challenges is therefore to determine how to sort tasks (step 1) and in which order we should consider processors (in step 3). We are evaluating our new algorithms in the following ways: We plan to prove (mathematically) the performance of our new algorithms. Specifically, we are interested in proving that if it is possible to assign tasks to processors then our algorithm will succeed in assigning tasks to a processor if a given processor is x times as fast. Given that x is our performance metric; the lower its value, the better. We also plan to evaluate the performance of our algorithms by applying the algorithms on randomly-generated task sets. This will demonstrate the "typical" behavior of the algorithms. CONCLUSION Most semiconductor manufacturers are shifting towards heterogeneous multicore processors to offer greater computing capacity while keep power consumption sufficiently low. But using a heterogeneous multicore efficiently for cyber-physical systems with stringent size, weight, and power requirements requires that tasks are assigned properly. This blog post has discussed the state-of-art and summarized our ongoing work in this area. ADDITIONAL RESOURCES To read the paper "Partitioning real-time tasks among heterogeneous multiprocessors" by Sanjoy Baruah, please visithttp://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1327956 To read the proceedings "Assigning Real-Time Tasks on Heterogeneous Multiprocessors with Two Unrelated Types of Processors", please visithttp://www.cister.isep.ipp.pt/activities/seminars/%28S%28otidyq454nfyy255alnrdb3m%29%29/GetFile.aspx?File=%2Fspringseminars2011%2Frtss10_het2.pdf

SEI . Blog .  Jul 27, 2015 02:48pm

Toward Efficient and Effective Software Sustainment, Second in a Series

By Mike Phillips Principal ResearcherAcquisition Support Program In my preceding blog post, I promised to provide more examples highlighting the importance of software sustainment in the US Department of Defense (DoD). My focus is on certain configurations of weapons systems that are no longer in production for the United States Air Force, but are expected to remain a key component of our defense capability for decades to come, and thus software upgrade cycles need to refresh capabilities every 18 to 24 months. Throughout this series on efficient and effective software sustainment, I will highlight examples from each branch of the military. This second blog post describes effective sustainment engineering efforts in the Air Force, using examples from across the service’s Air Logistics Centers (ALCs). A Brief History of Software Sustainment From its earliest days, the military has provided facilities to maintain the functionality of its various weapon systems. The descriptive terms for these units have included "arsenals," "depots," and "logistics centers," to name a few. In the 1990s the Air Force consolidated its depot maintenance capabilities into three centers: Warner-Robins ALC in Georgia, Oklahoma City ALC in Oklahoma, and Ogden ALC at Hill Air Force Base (AFB) in Utah. A 2012 initiative within the Air Force Material Command will centralize the leadership of the ALC’s into a single entity, although the three sites will remain as sister units in a single "super ALC" headquartered at the Oklahoma City site. Within this geographical framework, we can overlay the increased importance of software in our weapon systems. Lloyd Mosemann, the deputy assistant secretary of the Air Force for communications, computers and support systems, was a visionary leader for software sustainment in the Air Force logistics arena in the 1980s. He recognized that the various ALCs would "inherit" responsibility for the various weapon systems as production waned and that "software sustainment" should be a well-developed capability within the organic structure of the ALCs. The first demand for an organization to achieve a maturity level was in a memorandum from Mosemann to the centers, directing them to achieve a maturity level 2 against the Capability Maturity Model for Software and then, level 3 a few years later (many think the "Mosemann letter" was aimed at industry, but it was not). This blog posting highlights examples of effective sustainment of software intensive systems across the three sites and recognizes that the successes achieved are the results of improvement efforts across the domain, well beyond the process domain. The workforce has grown in its technical competence, and a modern systems engineering environment has been developed. With that historical perspective, we can explore some of the examples evident today. In the mid 1970s, the Air Force created a "block" strategy as a way to pursue modernization while maintaining a relatively stable production approach. In the past, letter identifiers after the aircraft number showed a change to a new capability, including major hardware changes. With increasing software content, a block upgrade represented major functionality changes with relatively modest hardware configuration changes. The block strategy has become a common practice across the DoD, where software updates are released as a block of changes at regular intervals to avoid too many variations being sent to the field. By applying a block strategy, the DoD assures that there is a regular, anticipated opportunity to have the freshest capabilities deployed. The US Air Force F-16 program is an excellent example of the block strategy, with more than 4,400 F-16’s being produced and flown by 26 countries. A key program decision made early in the development of the F-16 was the deliberate strategy for evolving the F-16’s capabilities. This strategy was called the F-16 Multinational Staged Improvement Program (MSIP) and involved the continual enhancement of new aircraft and the retrofit of previous aircraft with system hardware and software capabilities. The Ogden ALC’s experience with the F-16 provides a good example of the success of the MSIP being applied in a sustainment environment. From a sustainment perspective, the focus has been on the F-16C/D, Block 30, since the later blocks (40 and 50) have remained in production. Continuous Improvement Initially the Ogden ALC used the CMM and then CMMI to guide its improvement efforts. More recently, they have joined their process improvement partners at the Navy Air Warfare Center at China Lake, Calif. in using the Team Software Process (TSP). The metrics produced by the Ogden team illustrate the following quality performance: Major upgrades continue to enjoy higher and higher productivity due to the quality emphasis contained in the disciplined approach to software production. Rework effort is down below 10 percent. Efficient Upgrade Planning Part of the effective planning highlighted above was an effort to provide reasonably sized updates on a regular schedule. For the F-16, the size has been about 500 thousand of source/software lines of code (KSLOC) using an upgrade cycle aimed at 18 to 24 months. The upgrade team has been able to accommodate high priority upgrades, such as a rapid introduction of the AIM-9X, by balancing sustainment and modernization loads effectively. The F-16 team has just begun to assimilate the transition of the later F-16 blocks 40 and 50. The foundation established with the Block 30 experience should enable the ALC sustainment team to master the learning curve for the more complex and diverse fleet being transitioned. The Airborne Early Warning & Control Systems (AWACS), which are supported by the Oklahoma City ALC, provide another example. These systems fit in a Boeing 707-based aircraft with a large saucer shaped antenna housing radar systems that can be upgraded to improved hardware and software capabilities. Reflective of many DoD platforms, the AWACS system entered operations in 1976 and its critical mission systems are continually upgraded to extend its capabilities. The multi-source integration function that is part of the most recent AWACS upgrade used open systems and lean architecture approaches, which are advocated by the SEI and allow more rapid software updates to the fleet without requiring extensive hardware changes. Another weapon system sustained and continuously improved upon at Oklahoma City is the venerable B-52 strategic bomber. If planned modernization efforts of this weapon system extend the life as currently proposed, the aircraft will have been in the active Air Force inventory for almost 90 years upon its retirement in 2040. The improvement to the software architecture expands its versatility with "smart weapons" and its network communications capability. In an era that demands a focus on affordability, extending the lifetime of the B-52—while enhancing its capability—demonstrates the power of software-focused modernization. The Warner-Robins ALC has responsibility for electronic warfare and radar systems, as well as weapon systems, including the F-15, C-130 (almost as old as the B-52) and the C-5. Across this collection of software-reliant systems, the center continues to improve the quality and timeliness of its upgrades. Of 246 software releases in fiscal year 2010, the ALC delivered 99 percent at or below expected cost, and 98 percent on or ahead of schedule. Only three systems contained defects that were discovered in the field after release and required rework. As with its two sister ALCs, Warner-Robins has committed to continuous improvement of its software capability. While we at the SEI are pleased with the unit’s involvement with our CMMI models, the organization has complemented that effort with attention to Lean Six Sigma and the Air Force Smart Operations for the 21st Century (AFSO21). The results are notable. The leadership of the software organization at Warner Robins has noted that the "old image" of software sustainment as small "bug fixes" has been replaced by recognition that with software-reliant systems, the opportunity arises to quickly develop innovative capabilities to solve the challenges facing their DoD—and allied nation—customers today. Each of the services has a wide variety of software-reliant systems, and organic capabilities to complement the major contractors who create sturdy platforms like the B-52 that can last nearly a century. The next installment in this series on efficient and effective software sustainment will examine the Naval Air Weapons Station at China Lake. Additional Resources To read the SEI technical report, Sustaining Software Intensive Systems, please visit www.sei.cmu.edu/library/abstracts/reports/06tn007.cfm?DCSext.abstractsource=SearchResults

SEI . Blog .  Jul 27, 2015 02:47pm

Improving Software Team Performance with TSP, First of a Two-Part Series

Experience from Financial SystemsBy Bill Nichols, Senior Member of the Technical Staff Software Engineering Process Management In his book Drive, Daniel Pink writes that knowledge workers want autonomy, purpose, and mastery in their work. A big problem with any change in processes is getting the people who do the work to change how they work. Too often, people are told what to do instead of being given the information, autonomy, and authority to analyze and adopt the new methods for themselves. This posting—the first in a two-part series—describes a case study that shows how Team Software Process (TSP) principles allowed developers at a large bank to address challenges, improve their productivity, and thrive in an agile environment. In 2009, Nedbank, one of the four largest banks in South Africa, launched a TSP pilot program in collaboration with the Johannesburg Centre for Software Engineering (JCSE) at the University of Witwatersrand and the SEI. After TSP pilot teams at Nedbank were given a toolkit—including training on how to measure, analyze, and communicate—their behaviors changed. Not only were the developers (a mixed team of elite and average programmers) able to work more autonomously, they implemented methods that resulted in higher quality work in less time. The Nedbank TSP pilot involved two software-intensive projects: one maintenance project and one new development project. Nedbank had faced software process improvement challenges in the past. They therefore sought to address the issues of quality and productivity among their software engineering teams by implementing a methodology that would improve the teams' performance through planning, tracking their work, establishing goals, and providing teams with the tools to take responsibility for their processes and plans. Engage the Team The Nedbank TSP pilots began with separate training for everyone involved: senior managers, team leads, software developers, and non-developer team members. The groups learned how the process change would affect them and their relationship with others in the organization. They learned that working on a TSP project required them to behave differently in terms of reporting, data collection, expectations regarding other team members, and interactions with other team members. Developers learned the Personal Software Process with a focus on software engineering discipline, measurement, and planning. Non-developers, such as testers, business analysts, and documenters, learned to work within the new team and project structure. Throughout the courses, Nedbank team members learned what a measured software process looks like and how to measure the software process. They also learned to communicate with a focus on the data, project, and work, not on personal issues. Launch the Project TSP projects begin with a structured launch to plan the work, including nine meetings plus a post-mortem meeting to satisfy specific project planning needs. The Nedbank teams began by understanding project goals from a management standpoint and then identified supporting team goals and personal objectives for each participant. These goals helped frame their subsequent decisions on strategy, process, support, and work breakdown. The project was then broken down into small pieces, and work was assigned among team members. Each member was then able to commit to the plan and his or her specific interim goals, and understand how those goals support the organization's larger goals for the project. The key to success was that the team had the autonomy to plan and manage the work and determine how it would be done. For example, early in the planning phase, the team working on the maintenance project realized that the schedules of the designers and the developers were not aligned. The designs wouldn't be ready when the developers needed them. Working together, the team approached the schedule problems in the following two ways: The team lead went to management and negotiated priorities, using his team's progress data to support the schedule needs. The team lead was then able to convince the designers to prioritize their work so that they could supply the developers with the designs needed to proceed. The senior developers next took the load off the designers by taking on some of the design work. In addition, the team identified modules that would not have to be changed as well as sets of programs that would be easy to update. This change in approach not only helped meet project needs, but also helped satisfy the management goal that teams gain a big picture view of the project. Deal with Mid-Project Issues Early in the development process, both projects fell behind schedule. Discipline and measurement became critical. By tracking their time and following their defined steps, team members were able to identify their status precisely and understand how they got there. The team lead made sure the measurements and data were discussed in the status meeting. The coach helped the team understand what the measurements and data meant and how these facts affected the work. With this data-driven feedback, the teams saw an increase in the number of task hours (direct time on project tasks) per week, as well as a sharp reduction in defect rates. With the coach, the teams periodically analyzed their data and improved their processes. By the end of each pilot project, the quality of each team's work had improved significantly. They were able to find and remove defects earlier in the process. After the initial delivery, no further defects were found in system tests or production in the remaining three cycles. There were zero defects in deliveries two through four. This improvement required months of effort, training, management support, coaching, worker motivation and engagement, and meaningful data-based feedback. See the Results In the end, the TSP pilot teams at Nedbank made significant behavioral changes that not only improved the quality of the software but also improved team members' work lives by decreasing the need for evening and weekend overtime. The teams were able to make these improvements because they had project-specific measurements to guide their decisions, and they had the authority to implement those decisions. Based on the results of the pilots Nedbank decided to implement TSP throughout the organization. To learn more about Nedbank's views on TSP, see their rollout video below: This video requires the Adobe Flash Player Plug-in please go to the following URL to download: http://get.adobe.com/flashplayer/ Expanding and scaling any process comes with challenges. These topics will be discussed in the second post of this series: Achieving Quality and Speed with TSP Organization-Wide. Additional Resources: To read the SEI technical report Deploying TSP on a National Scale: An Experience Report from Pilot Projects in Mexico, please visitwww.sei.cmu.edu/library/abstracts/reports/09tr011.cfm To read the Crosstalk article A Distributed Multi-Company Software Project by Bill Nichols, Anita Carleton, & Watts Humphrey, please visitwww.crosstalkonline.org/storage/issue-archives/2009/200905/200905-Nichols.pdf To read the SEI book Leadership, Teamwork, and Trust: Building a Competitive Software Capability by James Over and Watts Humphrey, please visit www.sei.cmu.edu/library/abstracts/books/0321624505.cfm To read the SEI book Coaching Development Teams by Watts Humphrey, please visitwww.sei.cmu.edu/library/abstracts/books/201731134.cfm To read the SEI book PSP: A Self-Improvement Process for Engineers by Watts Humphrey please visitwww.sei.cmu.edu/library/abstracts/books/0321305493.cfm

SEI . Blog .  Jul 27, 2015 02:47pm

Rapid Lifecycle Development in an Agile Context

By Robert Nord, Senior Member of the Technical StaffResearch, Technology, & System Solutions New acquisition guidelines from the Department of Defense (DoD) aimed at reducing system lifecycle time and effort are encouraging the adoption of Agile methods. There is a general lack, however, of practical guidance on how to employ Agile methods effectively for DoD acquisition programs. This blog posting describes our research on providing software and systems architects with a decision making framework for reducing integration risk with Agile methods, thereby reducing the time and resources needed for related work. The DoD chief information officer (CIO)’s office has recently released a 10 Point Plan to reform DoD information technology (IT). Point number 4 is "Enable Agile IT." The key tenets of Agile IT include Emphasize incremental introduction of mature technology by delivering useful capability every 6 to 12 months to reduce risk through early validation to users. Require tight integration of users, developers, testers, and certifiers throughout the project life cycle to meet Agile IT’s promise of rapid delivery in lieu of extensive up front planning. Leverage common development, test and production platforms and enterprise products to deliver IT capabilities faster, cheaper, and more interoperable, without redundant infrastructure and documentation. Establish a change-tolerant design environment enabled by discovery learning that promotes decisions based on facts rather than forecasts. Program managers and acquisition executives are responding to this plan by applying industry practices, such as Agile methods. At the same time, related DoD guidelines encourage system development via a more open, modular software architectural style, such as loosely-coupled service-oriented architectures. The impact of these architectural decisions becomes more critical in an agile context because they promote or inhibit the ability to achieve agile goals, such as rapid feedback, responding to change, and team communication. Architectural decisions influence agile development properties, such as the length of an iteration, the allocation of stories to iterations, and the structure of the teams and their work assignments with respect to the architecture. What is needed, therefore, is research on eliciting and employing these properties and determining their relative importance in promoting rapid lifecycle development. To address this need, I and other SEI technologists, Ipek Ozkaya, Stephany Bellomo, and Mary Ann Lapham, are conducting research that focuses on the implications of decisions made over the course of the software and systems lifecycle. We are examining when these decisions are made and the time when the implications surface to validate the following hypotheses: The fundamental early decisions made during the pre-engineering and manufacturing development (pre-EMD) phase have an impact throughout the lifecycle. Acquisition programs lack the techniques needed to elicit and preanalyze the implications of which fundamental architectural decisions will enable or inhibit their ability to adopt an agile lifecycle. For example, architectural decisions that relate to decomposition and dependencies influence teaming structure, the capability to rapidly and confidently deliver features, and the ability to rapidly integrate new components among other factors. The implications of the early decisions often surface in the final stages of the lifecycle, downstream from development. For example, unexpected rework related to correcting integration defects. When these problems are discovered further downstream from where they were injected in the lifecycle, they are more expensive to correct and this often causes cost overruns and delays project completion. Identifying Critical Properties A primary objective of our research is identifying critical properties of Agile software development influenced by architectural decisions that reduce lifecycle time. Identifying these properties is an important first step to give stakeholders the tools they need to make informed decisions that manipulate the settings of the properties to control for better outcomes. Our approach involves examining the implications of possible change scenarios. One such scenario examines the impact of introducing an emerging multi-core technology to the mission processor software of the Apache helicopter, which is a US Army/Boeing program. Applications are increasingly favoring multi-core processors, a single integrated computing element with two or more independent processors, called "cores," allowing for greater processing capacity. The scenario explores questions that include Will the multi-core technology change the architectural design (e.g., patterns, tactics, and architectural approaches)? If so, which architectural decisions will change? Are there engineering practices (such as Agile) that must be customized as a result of architectural decisions in support of a rapid lifecycle? How will continuous integration be ensured? How will team communication be impacted? A key component of our work involves determining the critical decisions that will influence Agile practices. These decisions provide the rationale for how software and system architects design an architecture. To expand our view, we will again collaborate with Philippe Kruchten, a professor of software engineering in the Department of Electrical and Computer Engineering of the University of British Columbia, who is active within Agile development and architecture decision making research communities. Another facet of our approach involves interviewing DoD programs and gathering data from members of the SEI Agile Collaboration Group. Creating a Model Rework must be considered early when reasoning about how to enable rapid lifecycle development. After we review the findings identified in the first phase of our work with collaborators, we will create an architectural decision model that allows software or systems architects to analyze the ramifications of their decisions. Based on our prior work, we anticipate that highly impactful architectural decisions will include Decisions about interfaces and how the parts of the system are connected. Decisions about structuring the systems to achieve quality attribute requirements. The impact of these decisions typically spans multiple areas within the system and is not localized within a single module. We plan to use the Multiple Domain Matrix (MDM) to represent the decision model and to analyze the impact of architectural decisions on rapid lifecycle development. The MDM approach considers decision dependencies that provide visibility into how the ordering of decisions influences when development can be started and how changes propagate and may require rework of software elements. This approach will allow us to test our hypothesis that modeling architectural decisions during early stages of development (similar to pre-EMD) will reduce cycle time. Cycle time could be reduced upstream by enabling an earlier start of development, thus minimizing the time spent at the pre-EMD phase or reduced downstream by decreasing rework costs attributed to architectural decisions that affect integration. Another component of our research will explore strategies for improving the relationship between architecture decision making and complex collaborative team interaction. A barrier that often arises is that decisions made by architects about partitioning the architecture are not aligned with the networks of agile teams at scale and for the kinds of systems relevant to the DoD. Another barrier is alignment with the teams that span the lifecycle beyond the development teams traditionally associated with agile and include system engineering, testing, validation and verification, certification and accreditation. We have observed the ramifications of this misalignment during the integration of components built by different teams, where incompatibilities lead to significant rework. To map, analyze, and support the architectural decisions of industry collaborators, we plan to map our MDM approach into a conceptual model developed by Kevin Sullivan of the University of Virginia, who is working on creating a cyber-social conceptual model. Sullivan’s work focuses on the social networks and the value of relationships between decision makers in a system. Challenges A primary technical challenge that we face in this approach involves scaling. As a practice, Agile has been successful in helping to solidify the efficiency of development teams when projects involve small teams. Applying these same concepts to large-scale distributed systems— including the rest of the organization that has priorities larger than the development team—will be critical for success, but it will also present some of the greatest challenges. To address the challenge of scaling, we are looking at the influence architecture exerts on managing teams and how to provide practical guidance on what amount of architecture is needed and when. On the one hand, early or overproduction of architecture can create delay. On the other hand, not enough production of architecture can result in integration defects leading to rework. The focus of lean thinking on improving cycle time by eliminating waste, in the form of delay or unnecessary rework, in conjunction with architecture, has shown great potential for improving management of software development projects and increasing flow of value delivered to the customer. Next Steps We are in the process of conducting a survey of critical agile development properties and will write about the results in a future blog posting. Additional Resources: To read the SEI technical report, Agile Methods: Selected DoD Management and Acquisition Concerns, please visit www.sei.cmu.edu/library/abstracts/reports/11tn002.cfm

SEI . Blog .  Jul 27, 2015 02:47pm

Thread Role Analysis

By Dean Sutherland Senior Member of the Technical StaffThe CERT Program Many modern software systems employ shared-memory multi- threading and are built using software components, such as libraries and frameworks. Software developers must carefully control the interactions between multiple threads as they execute within those components. To manage this complexity, developers use information hiding to treat components as "black boxes" with known interfaces that explicitly specify all necessary preconditions and postconditions of the design contract, while using an appropriate level of abstraction to hide unnecessary detail. Many software component interfaces, however, lack explicit specification of thread-related preconditions. Without these specifications, developers must assume what the missing preconditions might be, but such assumptions are often incorrect. Failure to comply with the actual thread-related preconditions can yield subtle and pernicious errors (such as state corruption, deadlock, and security vulnerabilities) that are intermittent and hard to diagnose. This blog post, the first in a series, describes our ongoing research towards solving this problem for a variety of languages, including Java and C11. Previous Work In previous work, we introduced the concept of "thread usage policy," which is a group of often unspecified preconditions used to manage access to shared state by regulating which specific threads are permitted to execute particular code segments or to access particular data fields. The concept of thread usage policy is not language specific; similar issues arise in many programming languages, including Java, C11, C#, C++, Objective-C, and Ada. The preconditions contained in thread usage policies can be hard to identify, poorly thought out, unstated, poorly documented, incorrectly expressed, out of date, or simply hard to find. These problems inspired us to devise a means of specifying these preconditions in a form that developers would find both useful and acceptable. We developed a simple, formal specification language for modeling thread usage policies, which we call the language of thread role analysis. We devised appropriate abstractions of the key semantic building blocks of thread usage policies (including thread identity, concrete code segments, and data fields) so that developers can build a model of the thread usage policy (hereafter referred to as simply "policy") by expressing preconditions as simple precise annotations in program code. Current Work My focus is on bringing thread role analysis to bear on programs written in C11, which is the current standard of the C programming language (the core analysis is also similar to the analysis for Java). To ensure relevance to practicing programmers, we integrate our analysis with widely used programming tools—in this case, the CLANG/LLVM open-source compiler suite. The C11 work is still in its early stages, so I’ll present an example from our previous work in Java. Electric Had a Problem The developers of the Electric VLSI design tool had re-implemented it in Java for their V8.0 release, in part to support a change to a multi-threaded architecture. Users of the new version of the tool encountered seemingly random internal errors and crashes. These crashes were rarely repeatable, however, and usually disappeared when developers tried to debug the program. Since they were experienced developers, they quickly realized that these symptoms were probably caused by concurrency errors. This diagnosis seemed especially likely, since the major new feature in the most recent release was the implementation of a multi-threaded architecture for performance. The developers needed to quickly identify the problem in their 140 KLOC program, and we were able to do that in less than 8 hours using thread role analysis. We analyzed version 8.0.1, which contained roughly 140 KLOC in 44 Java packages. We describe the process in the form of a reverse-engineering exercise, partly because that’s how we addressed the problem and partly because it helps explain thread role analysis. Note, however, that this reverse-engineering-focused description makes thread role analysis sound relatively time-consuming and hard. In case studies, we found that developers working on their own code could easily identify policies, answer questions about intended thread usage, and locate key points for annotation. Identifying Thread Usage Policies The first step in thread role analysis is to identify relevant pre-existing policies. We’ll use the Electric tool from our example because the developers had a pre-existing written thread usage policy, which appears here in a slightly edited form: There is one persistent user thread called the DatabaseThread, created in com.sun.electric.tool.Job. This thread acts as an in-order queue for Jobs, which are spawned in new threads. Jobs are mainly of two types: Examine and Change. These jobs interact with the Database, which are objects in the package hierarchy com.sun.electric.database The Rules are Jobs are spawned into new Threads in order of receipt. Only one Change job can run at a time. Many Examine jobs can run simultaneously. Change and Examine jobs cannot run at the same time. Examine jobs can only look at the Database. Change jobs can look at and modify the Database. Because only one Change job can run at a time, the Change job is run directly in the DatabaseThread. Examine jobs are spawned into their own threads, which terminate when the Examine job is done. We thus identified one of the two key policies that needed to be expressed; the other is the policy for Java’s AWT/Swing GUI framework, used by Electric’s graphical user interface (GUI). In less than one day of effort, we expressed models of these two policies and used the findings produced by our analysis tool to diagnose a set of "seemingly random intermittent failures" experienced by the development team and their users. Policy for the GUI The Electric GUI uses the AWT and Swing frameworks, which share a single thread usage policy. The GUI framework implementation is not multi-threaded; rather, it executes in its own "event thread" and prescribes rules for how non-GUI threads may interact with it through the framework APIs. The salient points of the policy (according to Bowbeer) are as follows: There is at most one "AWT thread" at a time per application. There may be any number of separate "Compute threads." A Compute thread is forbidden to paint or to handle AWT data structures or events. Failure to comply can lead to exceptions from within the AWT, because the AWT avoids both potential deadlock and data races by accessing its internal data structures from within a single thread, without the use of locks. Extended computation on the AWT thread is forbidden; "brief" computation is acceptable. While the thread is computing, it cannot respond to events or repaint the display; this "freezes" the GUI until the computation finishes. It is important to note that the AWT data structures mentioned above explicitly include any fields of user-defined classes that extend the library-defined AWT and Swing framework classes. Thread Role Versus Thread Identity The policies expressed in prose above allow us to highlight an important feature of thread role analysis—focusing on thread roles rather than thread identities. The GUI policy above speaks of the "AWT thread"; although it fails to mention that, in some implementations, its identity changes from time to time. We don’t care which specific thread is the AWT thread right now; we care only that it performs the role of "AWT thread." Similarly, Electric’s thread usage policy permits multiple Examine jobs, each with its own thread. We care about the "Examine" role but not about the identity of the various threads that perform it. Expressing Thread Usage Policy The first step is to declare any needed thread roles. The @ThreadRole annotation declares names for thread roles. Roles are opaque identifiers that have their own namespace. The Electric policy mentions two thread roles: one for examining the Database and one for changing it (n.b.: Electric’s "Database" is a large data structure that represents the circuit the user is designing; it is not a database in the sense of MySQL or other relational databases). Here are the declarations of the thread roles for Electric, which are described as Java comments: /** **@ThreadRole DBChanger , DBExaminer **@MaxRoleCount DBChanger 1 **@IncompatibleRoles DBChanger, DBExaminer, AWT **/ Likewise, here are the declarations of the thread roles for the GUI: /** * @ThreadRole AWT, Compute * @MaxRoleCount AWT 1 * @IncompatibleRoles AWT, Compute **/ We will use the DBChanger role for all change Jobs and the DBExaminer role for all Examine jobs. Similarly, in the GUI policy, we see two thread roles: the "AWT thread" and "Compute threads." We declare these roles using the annotation @ThreadRole AWT, Compute, thus capturing the roles described in rules 1 and 2 of the GUI policy. We use the AWT role for the AWT thread and the Compute role for all other threads. Next, we declare global constraints on thread roles. For example, an important aspect of the GUI policy indicates that the "AWT thread" and the "Compute threads" are distinct; it is forbidden for any thread to perform both roles at once. Similarly, the Electric policy implies that the Change and Examine roles are distinct. In each case, this incompatibility property allows us to conclude that a thread performing one of these roles necessarily excludes all of the others. Incompatibility is one of the few postconditions in our language for thread role analysis. We state this incompatibility for the GUI framework by writing @IncompatibleRoles AWT, Compute for its API. The @IncompatibleRoles annotation in the snippet above specifies the incompatibility for Electric’s thread roles, as well as for the GUI’s AWT thread. These annotations capture the third rule of the GUI policy and the analogous—but implicit—rule for Examine and Change jobs from the Electric policy. Note that if we had followed the Electric policy literally, our annotation might have incorrectly omitted the AWT thread. Finally, both the GUI policy and the Electric policy identify thread roles that may be performed by, at most, one thread at a time. We document this with @MaxRoleCount AWT 1 for the GUI and the similar annotation in the snippet above. Because "any number" is the default, we omit @MaxRoleCount annotations for the other thread roles. Wrap Up and Look Ahead We’ve now reached the halfway point in the process of expressing the thread usage policies. So far, we’ve identified the relevant thread usage policies along with the thread roles needed to express those policies. We’ve written annotations to declare the thread roles and to express the few global constraints on those roles—most notably incompatibility. In the second installment in this series, we’ll finish the thread usage policy by associating thread roles and thread role constraints with code segments, and show how thread role analysis enabled us to diagnose in 8 hours a violation that took "multiple" Electric developers "weeks of time" to fix independently. In the third and final post, we’ll discuss bringing thread role analysis to C11, along with techniques to improve adoptability by reducing required programmer effort and supporting analysis of partially annotated code. Additional Resources To read the paper, Composable Thread Coloring (which was an earlier name for the technique we now call thread role analysis) by Dean Sutherland and Bill Scherlis, please go towww.fluid.cs.cmu.edu:8080/Fluid/fluid-publications/p233-sutherland.pdf.

SEI . Blog .  Jul 27, 2015 02:46pm

Developing Controls to Prevent Theft of Intellectual Property

By Randy Trzeciak, Senior Member of the Technical Staff The CERT Program According to the 2011 CyberSecurity Watch Survey, approximately 21 percent of cyber crimes against organizations are committed by insiders. Of the 607 organizations participating in the survey, 46 percent stated that the damage caused by insiders was more significant than the damage caused by outsiders. Over the past 11 years, CERT Insider Threat researchers have collected incidents related to malicious activity by insiders obtained from a number of sources, including media reports, the courts, the United States Secret Service, victim organizations, and interviews with convicted felons. From these cases, four patterns of insider threat behavior have been identified: (1) information technology (IT) sabotage, (2) fraud, (3) national security/espionage, and (4) theft of intellectual property (IP). From those patterns, our researchers developed controls that combine technological tools with behavioral indicators to identify employees at risk for committing cyber crimes. These tools and indicators provide those who monitor networks a better warning of potential anomalous behavior. This blog posting—the first in a series highlighting controls developed by the CERT Insider Threat Center—explores controls developed to prevent, identify, or detect IP theft. Motives and Behaviors By analyzing more than 700 insider threat cases, CERT researchers have identified a series of patterns and behaviors based on the motive of the perpetrator and the impact to an organization. For example, of the documented insider threat cases that we analyzed, 84 incidents are categorized as theft of IP in which employees take information with them as they leave to go to work for a competing organization, use the information to get a job with a competitor, or start their own competing company. In approximately one third of the theft of IP cases in the CERT database, the insider took the information and gave it to a foreign organization or government. An interesting finding emerged when the researcher analyzed these cases: the majority of the insiders (approximately 70 percent) who steal IP do so within 30 days of announcing their resignation. This window gives an organization an opportunity to detect potential malicious activity. Many organizations do not have the resources to alert and investigate everytime a document is sent off of the network, so this window may allow for focused attention during higher risk periods, thereby reducing the high volume of false positives that may be returned via continual data leakage identification. That finding is used when developing the theft-of-IP technical control outlined in this blog. Based on these observations, we constructed a model of employees who steal information. The model takes into account technological variables, social variables, and the relationships between them. By studying the patterns in various cases, we observed how the crimes tend to evolve over time, and the trends we noticed provided the foundation for our model. After our researchers established the model, they narrowed their focus to portions of it where controls may be applied to prevent or detect information leaving the organization’s network. For example, they configured a tool alert on suspicious activity possibly indicating that a departing employee may be stealing intellectual property. An organization can then use an open source, log-aggregation tool to write rules to alert when potential suspicious activity is observed. For example, analysts can write a query in a log-aggregation tool, such as SPLUNK, to flag employees who meet these criteria: Their system accounts were disabled or are scheduled to be disabled in the next 30 days. They are sending email through the organization’s network. Those emails include attachments. Analysts can further refine the SPLUNK tool to focus on employees in that group who have resigned within the last 30 days and are sending emails with attachments from personal email accounts. (Much of the activity will probably be authorized, but the approach allows organizations investigating suspicious activity to gain a better idea of what activity warrants additional investigation.) Our aim with this research is not to create new tools, but rather to allow organizations to configure their existing arsenal so it is more effective at preventing or detecting malicious insider activity. The controls developed by our researchers should be used in addition to existing tools that many organizations already own, including Data loss prevention (DLP) tools. These tools allow organizations to prioritize critical assets, and observe when employees are accessing data and when that data is being sent through the network. Digital rights management (DRM) tools. These tools allow organizations to require that critical data be validated or authenticated against data on its network. For example, if information was taken off a network, it could not be used on another network, and no one would be able to open it up and view it. Security information and event management (SIEM) tools. These tools allow organizations to detect anomalies on networks and networked systems. One example of such an anomaly would be an employee’s login outside of normal working hours using a remote connection. Telltale Signs In the October 2011 SEI technical note titled Insider Threat Control: Using Centralized Logging to Detect Data Exfiltration Near Insider Termination, CERT researchers Michael Hanley and Joji Montelibano described the controls developed to prevent IP theft. They reported that the primary vehicles for data exfiltration over the network are corporate email systems or web-based personal email services. They therefore concluded that organizations should consider doing the following when trying to prevent, detect, or deter IP theft: Monitor for misuse of web-based personal email services. This mode of exfiltration will be addressed in detail in future research. Monitor for email to the organization’s competitors or the insider’s personal account. Corporate email accounts running on an enterprise-class service contain robust auditing and logging functionality available for use in an investigation or, in this case, a query to detect suspicious behavior. Taking these factors into account, an organization can proceed on an implementation strategy for these conditions on a logging engine. Hanley and Montelibano defined the following implementation outline: If the mail is from the departing insider and the message was sent in the last 30 days and the recipient is not in the organization’s domain and the total bytes summed by day are more than a specified threshold then send an alert to the security operator With the time element serving as the root of a query, any of the following could be used to verify the query: an active directory a lightweight directory access protocol (LDAP) directory service partial human resources records badge access status After the query has narrowed the field to all mail sent within a certain timeframe (the 30-day window), the query will next identify mail traffic that has left the local domain namespace of the organization. This constraint flags email messages to recipients in a namespace that the organization has no control over. The next constraint examines the email byte count to identify exfiltrated data. Hanley and Montelibano set a reasonable per-day byte threshold of between 20 and 50 kilobytes to identify whether several attachments or large volumes of text pasted into the bodies of email messages have left an organization’s network on a given day. Future Research Our future research is focusing on verifying that control models are still applicable and on developing new controls to address other modes of insider crime. The next blog post in this series will examine research that developed controls to prevent, detect, or mitigate IT sabotage by insiders. Additional Resources To read the SEI technical note, Insider Threat Control: Using Centralized Logging to Detect Data Exfiltration Near Insider Termination, please visit www.cert.org/archive/pdf/11tn024.pdf. To read the CERT Insider Threat blog, please visit www.cert.org/blogs/insider_threat/.

SEI . Blog .  Jul 27, 2015 02:46pm

Towards Common Operating Platform Environments, Second in a Series

Part 2: Understanding Success Drivers By Douglas C. Schmidt, Principal Researcher Common operating platform environments (COPEs) are reusable software infrastructures that incorporate open standards; define portable interfaces, interoperable protocols, and data models; offer complete design disclosure; and have a modular, loosely coupled, and well-articulated software architecture that provides applications and end users with many shared capabilities. COPEs can help reduce recurring engineering costs, as well as enable developers to build better and more powerful applications atop a COPE, rather than wrestling repeatedly with tedious and error-prone infrastructure concerns. Despite technical advances during the past decade, however, building affordable and dependable COPE-based solutions for the DoD remains elusive. This blog posting—the second in a three-part series—builds upon the first posting to describe key success drivers for COPEs that proactively and intentionally exploit commonality across multiple DoD acquisition programs. This blog is based on work by researchers at (and associated with) the SEI—including myself, Adam Porter, John Robert, and Mike McLendon—who are investigating how to create and govern COPEs successfully. We have identified the following three classes of drivers that DoD acquisition programs and system integrators must master to improve the odds of succeeding with COPEs: Business drivers: Achieving effective governance and broad acceptance of the economic aspects of COPEs. When the DoD had the resources to acquire and sustain many redundant solutions, it was often hard to motivate the adoption of common software services and capabilities within the acquisition community. Adopting these services and capabilities were perceived as introducing program and technical risks and potentially impeding the ability of system integrators to offer more unique, custom-based solutions that were more expensive and perhaps perceived as more effective. Now that the status quo is no longer economically viable (which has been the case for many years and is now exacerbated in the shadow of sequestration), government and defense industry leadership has renewed interest in COPE initiatives. While this top-down support for COPEs is welcome and necessary, it is insufficient if program managers and system integrators do not fully accept the need to adopt new business models.Because both government and industry are significantly affected by new business models, they must devise collaborative, socio-technical ecosystems where participants share the risks and rewards of COPEs. One promising approach is managed consortia, which provide a solid commercial and legal foundation for forming and coordinating COPE government and industry stakeholders. These consortia are even more effective when they yield interoperable, open standards that are implemented by multiple open- and closed-source suppliers.Two often overlooked COPE strategy components are policy and governance, which are essential to success of collaborative approaches. These components are often not addressed explicitly because they are often not perceived as being important and are "organizational messy." DoD acquisition policy and guidance must emphasize COPE as an acquisition business and technical strategy at all levels within the acquisition and sustainment community. The collaborative environment also demands that concepts, structures, and processes for governance be an integral component of the overall COPE strategy. DoD program offices also need to work closely with system integrators to ensure that proper contracting models are adopted to incentivize cost-effective, on-time delivery of innovative and integrated solutions. While COPE based technical solutions may be feasible and desirable, these solutions cannot be achieved unless the government first puts in place the proper contract models to appropriately incentivize technical and program behaviors consistent with the government’s business goals. Effective contract models can also be the means to enable rapid-delivery order execution can help streamline technology insertion. In addition, the government must negotiate necessary licenses and data rights, technical data on verification and validation facilities, and procedures to decrease total ownership costs over program life cycles by retaining access to key software and documentation artifacts throughout the development and sustainment phases. Management drivers: Ensuring effective leadership and guidance of COPE initiatives. While it has become fashionable to pay lip service to the goals and benefits of COPEs, it is much harder to find program managers and senior acquisition executives who can successfully sell and defend the near-term investments in time and effort needed to achieve the long-term payoffs of COPEs. These leaders must not only recognize the strategic role of software in DoD systems, but they must also articulate this role in ways that resonate with congressional appropriators, authorizers, and their staffs.Technical and acquisition leaders should also be savvy and avoid placing their bets on technological "silver bullets" and fads. They should also manage the application of modern iterative and incremental methods at scale, as opposed to traditional waterfall methods. COPEs are most effective when they are developed with strong feedback loops between developers of the reusable COPE infrastructure and developers of applications that use the COPE. Since COPE efforts rarely have the time or resources to please all customers, it is important for managers to be goal-directed—rather than exhaustive—when determining which common assets to develop and sustain. Without continual interactions with application developers, therefore, software artifacts produced by COPE developers rarely address core business problems and thus will not be reused effectively.COPE technical managers must also know when to build and when to buy reusable software platforms and tools. Managers who cling tenaciously to particular platforms or tools, and who ignore all other options, typically trade off short-term progress for long-term pain. A more effective, long-term approach involves working with open standards and establishing affiliations with industry standard groups to ensure continuity across the COPE life cycle. Technical drivers: The foundations of COPE development. The operational and programmatic success drivers for COPEs often garner the most attention because they fundamentally depend on people, who represent the most complicated and demanding part of socio-technical ecosystems. Even if we could magically solve these vexing challenges, there are still many technical drivers that influence the success or failure of COPEs.To start with, developing a successful COPE requires a clear architectural vision. This vision should be codified and documented by experienced software and system architects who possess a deep understanding of the canonical patterns and architectural styles of the domain(s) associated with a COPE. Other key elements associated with achieving an architectural vision for COPEs include developing open reference implementations for key parts of a COPE infrastructure to help DoD programs avoid getting locked into proprietary solutions adopting effective licensing models to ensure broad adoption and commercialization of COPE components ensuring a strong connection with R&D communities in software engineering and systems engineering to help mitigate technical risks Having a strategy for mitigating technical risks is particularly important for new and planned systems. Although the DoD and software R&D communities have some knowledge about foundational patterns and architectures for legacy DoD systems, they are less aware of key patterns and architectural styles for emerging systems, particularly net-centric systems-of-systems. Unfortunately, there are too few designated software and system architect positions in the acquisition community and program offices to insure an architecture focused vision is a driving foundational and life cycle technical imperative.DoD COPEs necessarily comprise a wide range of network, hardware, and software configurations; different algorithms; and different security profiles. This variation is a key driver of total ownership costs because it affects the time and effort required to assure, optimize, and manage unique system deployments and their unique and often multi-configurations throughout the life cycle. To manage this variation effectively, the SEI helped pioneer software product lines (SPLs). SPLs have been applied in COPEs to manage software variation while reusing large amounts of code that implement common features within a particular domain. SPL-based COPEs help reduce software development and sustainment costs by maintaining and validating reusable components in a common repositoryOther technical drivers associated with successful COPEs include (but are by no means limited to) the following: domain-engineering and use-case analysis methods that elicit and document COPE commonality and variability requirements and software architectures iterative and incremental life cycle methods, processes, and toolkits that help developers better plan, measure, and improve software producibility so they have better confidence in COPE quality and cost estimates software frameworks that codify the expertise needed to implement COPEs in the form of reusable algorithms and extensible and/or reusable component implementations software patterns that codify expertise needed to design COPEs in the form of reusable architecture themes and styles, which can be reused even when algorithms, components implementations, or frameworks cannot commercial-off-the-shelf component-based and service-oriented middleware that codifies expertise needed to develop COPEs in the form of portable open standard interfaces, interoperability protocols, and reusable building blocks COPE-specific middleware components and services that provide APIs and data models via a simpler facade that shields applications from the powerful (and complex) capabilities of the underlying domain-specific middleware frameworks higher-level languages, analysis tools, model-driven engineering technologies that enhance the productivity of COPE application developers and support "correct-by-construction" generation of software artifacts automated verification and validation methods, standards conformance test suites, and system execution modeling tools that leverage the powerful, commoditized computing resources in a distributed, continuous manner to improve persistent quality attributes of COPEs, ensure portability and interoperability, and assure key functional and performance attributes Succeeding with COPEs This blog posting just scratched the surface of the technical and non-technical issues associated with developing and sustaining COPEs for the DoD. In our experience working with many COPE initiatives over the past two decades, achieving success requires a multi-dimensional perspective to foster effective COPE ecosystems and leverage key linkages between the success drivers identified above. Organizations implementing COPE initiatives that address these drivers in a thorough and holistic manner thus have a fighting chance to succeed. Many challenges persist, however, as evidenced by the relatively few COPE success stories to date for the DoD. History shows that organizations that do not understand (or do not execute) these drivers properly will fail, often at great expense and great detriment to the warfighter. My next blog posting describes our work at the SEI on a COPE maturity model to help military and commercial organizations assess and improve their progress in developing and adopting systematic software reuse approaches for DoD acquisition programs. We welcome your feedback in the comments section below with suggestions on how the DoD can improve the technologies and ecosystems needed to develop COPEs more effectively. Additional Resource: To read the SEI technical report, A Framework for Evaluating Common Operating Environments: Piloting, Lessons Learned, and Opportunities, please visitwww.sei.cmu.edu/library/abstracts/reports/10sr025.cfm?DCSext.abstractsource=SearchResults

SEI . Blog .  Jul 27, 2015 02:45pm

New SIEM Signature Developed to Address Insider Threats

By Randy Trzeciak Senior Member of the Technical StaffThe CERT Program According to the 2011 CyberSecurity Watch Survey, approximately 21 percent of cyber crimes against organizations are committed by insiders. Of the 607 organizations participating in the survey, 46 percent stated that the damage caused by insiders was more significant than the damage caused by outsiders. Over the past 11 years, researchers at the CERT Insider Threat Center have documented incidents related to malicious insider activity. Their sources include media reports, the courts, the United States Secret Service, victim organizations, and interviews with convicted felons. From these cases, CERT researchers have identified four models of insider threat behavior: (1) information technology (IT) sabotage, (2) fraud, (3) national security/espionage, and (4) theft of intellectual property (IP). Using those patterns, our researchers have developed network monitoring controls that combine technological tools with behavioral indicators to warn network traffic analysts of potential malicious behavior. While these controls do not necessarily identify ongoing cyber crimes, they may identify behaviors of at-risk insiders that an organization should consider for further investigation. This blog posting, the second in a series highlighting controls developed by the CERT Insider Threat Center, explores controls developed to prevent, identify, or detect IT sabotage. Existing technical tools can be better configured to prevent instances of IT sabotage. Many organizations deploy Data loss prevention (DLP) tools and digital rights management (DRM) tools to try to stop theft of IP, or security information event management (SIEM) tools to mitigate IT sabotage. These tools are able to detect and examine network traffic, but determining the difference between anomalous and normal behavior remains hard. Behavioral Indicators Prior to IT Sabotage The CERT Program’s research has shown that employees who commit IT sabotage typically exhibit certain behavioral indicators prior to the crime. These usually begin with an employee’s unmet expectations of the organization, precipitated by a negative workplace event, such as being passed over for a promotion, failure to receive a raise or bonus, or demotion. Next, the employee becomes disgruntled and seeks revenge against the organization for a perceived injustice. Some behavioral indicators that may be observable in IT sabotage cases are performance problems, conflicts with coworkers or supervisors, outbursts in the workplace, and tardiness. The situation escalates to a point where the disgruntled employee sets up an attack using technical means. If such insiders have been denied access to the organization’s network, they often find ways to regain access (such as exploiting an unknown access path) to deploy their malicious code and then leave the organization or are terminated. The impact to the organization tends to become visible only after the insider’s departure. Using the Security Information and Event Management (SIEM) Signature The following SIEM signature can be used to determine the identity of individuals engaging in behaviors that an organization should consider investigating further, what remote connection protocol they are using, and whether this activity is occurring outside normal working hours. The signature is based on the following key fields: username, VPN account name, hostname of the attacker, and whether the attacker is using SSH, Telnet, or RDP. The characteristics of insider attacks include remote access to the organization’s information systems, outside normal working hours. Given these characteristics, we developed following signature: Detect <username> and/or <VPN account name> and/or <hostname> using <ssh> and/or <telnet> and/or <RDP> from <5:00 PM> to <9:00 AM> Note: This signature should only be applied to individuals who warrant increased scrutiny. This signature should not be applied to all privileged users because it will generate inordinate false positives. Two standards were used to create the SEIM signature: the Common Event Format (CEF) and the Common Event Expression (CEE): The Common Event Format (CEF) is an event interoperability standard developed by ArcSight. The purpose of this standard is to improve the interoperability of infrastructure devices by instituting a common log output format for different technology vendors. It assures that an event and its semantics contain all necessary information. Using this standard and the key indicators identified during the database analysis, we developed two CEF-based SIEM signatures, for Microsoft and Snort products, to identify suspected attackers. The Common Event Expression (CEE) architecture defines an open and practical event log standard developed by MITRE. Like CEF, the purpose of CEE is to improve the audit process and users’ ability to effectively interpret and analyze event log and audit data. It standardizes the event-log relationship by normalizing the way events are recorded, shared, and interpreted. Using the CEE format, we developed a signature based on the key indicators of insider IT sabotage. The signature identifies a suspected attacker who is using a remote connection to log onto the organization’s internal system outside normal working hours, and it also logs the time the event was recorded. Recognizing these behaviors has allowed us to create rules for when to apply a SIEM signature to detect insiders at risk of committing IT sabotage. By applying a SIEM signature, network traffic analysts can detect changes in configuration and changes in timing of network connections and specifically look at people who log in to the network outside of normal working hours. Using nontechnical indicators with the signature also helps to minimize the number of false positives. By combining behavioral and technical aspects, the SIEM signature can be used to help organizations act proactively, not reactively, to protect themselves. Future Research We are not advocating that organizations "advertise" the controls in an attempt to dissuade disgruntled employees from harming the organization. Instead, we want to persuade organizations to improve the communication between human resources, managers, and co-workers to identify potential disgruntled employees and apply additional IT Controls (including the SEIM signature) to identify potential suspicious changes to critical files. Our future work includes enhancing the CERT insider threat database by collecting incidents, verifying that the behavioral model is still current and applicable, and customizing the model to create more controls. We will continue to use our insider threat lab to test tools, develop controls, and make better recommendations for existing or new configurations of tools to prevent, detect, or respond to malicious attacks on a network. Additional Resources To read the technical report Insider Threat Control: Using a SIEM Signature to Detect Potential Precursors to IT Sabotage, please visit www.cert.org/archive/pdf/SIEM-Control.pdf. To read the technical report Using Centralized Logging to Detect Data Exfiltration Near Insider Termination, please visit www.cert.org/archive/pdf/11tn024.pdf. To read more about the CERT Program’s Insider Threat research, please visit www.cert.org/insider_threat/. To read about the new book The CERT Guide to Insider Threats by Dawn Cappelli, Andrew Moore, and Randy Trzeciak, please visit www.sei.cmu.edu/newsitems/insider-book.cfm. To read the CERT blog post Insider Threat Control: Using an SIEM signature to detect potential precursors to IT sabotage, please visit www.cert.org/blogs/insider_threat/2012/01/insider_threat_control_using_a_siem_signature_to_detect_potential_precursors_to_it_sabotage.html.

SEI . Blog .  Jul 27, 2015 02:45pm

Group-Context-Aware Mobile Applications

By Marc Novakouski, Member of the Technical StaffResearch, Technology & System Solutions Our modern data infrastructure has become very effective at getting the information you need, when you need it. This infrastructure has become so effective that we rely on having instant access to information in many aspects of our lives. Unfortunately, there are still situations in which the data infrastructure cannot meet our needs due to various limitations at the tactical edge, which is a term used to describe hostile environments with limited resources, from war zones in Afghanistan to disaster relief in countries like Haiti and Japan. This blog post describes our ongoing research in the Advanced Mobile Systems initiative at the SEI on edge-enabled tactical systems to address problems at the tactical edge. At the tactical edge, the people that need the information the most—warfighters, first responders, or other emergency personnel—depend on timely and valuable information to perform their tasks, or even survive. Unfortunately, the access to the information they need can be extremely hard to achieve, for the following reasons: information overload stemming from too much information, coupled with an inability to locate truly vital information information obscurity due to a lack of awareness of the available information, aka "you don’t know what you don’t know" resource scarcity manifested as insufficient bandwidth, central processing unit (CPU) power, battery power, or even attention span to get the needed information and continue to process, exploit, and disseminate it for as long as needed The remainder of this posting describes how we are tackling the information overload and information obscurity aspects of this problem by developing context-aware mobile applications. A Different Approach to Context-Aware Mobile Applications Context awareness in the mobile environment is not a new field of research. Most mobile devices come preloaded with applications that use location or time to account for user context. There is certainly no shortage of similar applications available for download. We decided, therefore, to explore alternative sources of data that would not only push the limit of what could be done with user context, but also focus on the extremely challenging environment at the tactical edge. Our "eureka" moment came when we realized that when warfighters or first responders are at the tactical edge, they are almost never operating alone. As a result, the most important contextual information to warfighters or first responders is the context of the people in the group, and how they relate to that context. This realization drove us to explore group context-aware mobile applications. These applications would, if built correctly, first consider individual user context and then relate that information to the group context, thereby helping users understand both their own state, as well as the state of the group in which they participate. Group context-aware mobile applications clearly have value at the tactical edge. For example, warfighters are well served by having access to positions of friends and foes on the battlefield (position data being a simple case). They could also benefit from supportive applications that monitor resources, such as food, ammunition, or vital signs. With sufficient data and processing power, these applications could even use historical trends to determine dynamically if a squad is walking into a possible ambush situation. In less deadly (yet still hostile) environments, such as tsunami disaster areas, the ability to share information about resource needs, dangerous situations, or health emergencies in a structured way can also be valuable. Such applications could tailor information to managers, construction workers, doctors, and other emergency personnel to help coordinate an effective emergency response. An extensive literature review on context awareness yielded relatively little research on the topic of group context. Much of the prior work cites the basic context model developed by Anind Dey, but does not expand the model past the individual, choosing instead to tailor the model to a particular domain. Our research project, called Information Security to the Edge (ISE), explores the structure, applications, and implementation of a context model that includes group information. We have constructed a prototype application on the Android platform that implements the essential components needed by group context-aware mobile applications, as discussed next. App Architecture - Logic and Data The ISE prototype application follows the common model-view-controller (MVC) pattern, which decomposes an application into the following parts: The model is the data. This data is the information processed by the application. For example, the words typed by the user into a word processing application are data. The view is the user interface. In the case of a word processing application, the view would be the buttons, menus, scroll bars, and other visual effects provided by the application to help a user write a document. The controller is the logic. In the case of a word processing application, the controller would be the rules the application uses to save, present, filter, and otherwise modify the text. The function provided by each button or menu item can also be part of the controller. Consistent with the MVC pattern, the ISE prototype has a central control mechanism (which forms the "brains" of the application) that manages data flow through the application. In practice, this means that the central controller coordinates data flow and processing through the following primary application elements: The context engine is the central processor for all context information used by the application. As device sensors report new data and applications on external devices send data to the local application, all data is passed through the engine so that new events are detected as they occur. For example, if an external user sends their GPS coordinates that indicate they are within 100 feet of a warfighter, then the device can alert the warfighter to their presence. Expanding on this concept, if a group task must be performed but everyone is working individually on their own tasks the local device can monitor task status and user position and report to the leader when all group members are ready and close by so the group task can be performed. The sensor manager accepts data from sensors that reside upon the mobile device. A typical smart phone contains position sensors, movement sensors, and in some cases, light and proximity sensors. The application captures data from these sensors and passes it through the sensor manager. The sensor manager enables the sensors and controls their sample rate, so that the application can tailor usage to the situation and avoid overwhelming the system. The communications manager acts as the gateway to all external communications within the system. This gateway currently includes Bluetooth and TCP/IP communications, but can be expanded to include other communication mechanisms that are available to the device. Any messages to and from users on other devices are passed through the communications manager. The sensor and communications manager architecture consolidates all sensor and communication concerns into a single location. This consolidation approach enabled us to build a standardized interface that simplifies integration an arbitrary sensor (for example, a radiation sensor) or an arbitrary communication mechanism (for example, a line-of-sight radio that communicates with UAVs) with the application. We tested this feature through a collaboration with Joao Sousa of George Mason University. This testing resulted in the development of an alternative communication mechanism that integrates with the prototype with only a few weeks of effort, instead of months or years. We anticipate leveraging these standardized interfaces to collaborate with a variety of external groups and organizations as new sensor technologies and communication mechanisms become available. App Architecture - User Interface (UI) The ISE app, through the use of Android UI screens called Activities, reflects the view part of the MVC Pattern. There are currently only three supported UIs in ISE: User: Allows users to look at the people with whom they are or can be connected, as well as the context data associated with each person. Task View: Allows users to create their own tasks, receive updates about other users’ tasks, and mark their tasks complete or incomplete. We are expanding the task view to include tasks, with main tasks and subtasks under them. Ultimately, we will develop a capability that displays complex missions in an intuitive manner. Alerts View: As events occur, some will automatically appear in the alerts view along with a list of the considerations the context engine has identified as items of importance for users. The alerts presented will be tailored to the needs and context of individual users. We are upgrading the ISE architecture to support any UI that subscribes to standardized updates from the data services. Challenges One challenge we face involves accounting for the lack of network infrastructure. In particular, limited bandwidth exists for the available communication channels. We are building atop of communication capacities that other organizations are field testing in Afghanistan to tailor our solution to practical field situations. A second challenge involves providing warfighter access to backend data sources. Soldiers told us that important information is available in such sources, but they can’t readily find the relevant information. Moreover, they can’t access the database in the field. Other Advanced Mobile Systems work is investing ways to provide access to critical data through the use of cloudlets. A third challenge involves reducing the user’s cognitive load by limiting the amount of interaction and attention required of the user. Residents in a metropolitan area can use their smart phones without undue concern for being in a distracted state, as long as they are not engaging in tasks that demand undivided attention. A soldier in Haiti, on the other hand, must be cognizant of crumbling buildings, while a warfighter on the ground in Afghanistan might need to digest information while taking enemy fire. Our goal is to use hardware that allows the warfighter to capture and process information seamlessly, without sacrificing valuable time and resources. We are also addressing the challenge of resource scarcity. Resources are limited at the tactical edge and warfighters are typically limited to the power and bandwidth of whatever devices they can carry. We are therefore exploring resource optimization based upon our expanded model of context. For example, if a warfighter’s assignment involves driving through a known safe area, it may not be necessary for the smartphone to activate the GPS capability. By optimizing the system to use sensors only when needed, warfighters can save battery power, CPU cycles, and communication bandwidth that can be used to support other mission-critical needs. Finally, our work will not have the desired impact if we cannot meet the challenge of relevance. Warfighters made it clear to us that if a device or application is not directly useful to their immediate task, it will be ignored. In any given day, a warfighter in Afghanistan may be asked to determine if a particular individual is a threat, sweep a village to establish identities of residents, deliver food to children, or check for a weapons cache. These different missions impact the type of information that interests soldiers and the type of information a software application should consider. Solving this problem requires a deep understanding of the needs of soldiers and the missions in which they engage. We are leveraging this domain knowledge so our ISE application can tailor information processing to a particular mission, thereby ensuring relevance to the current mission and the ability to change mission parameters as needed. Looking Ahead The ISE prototype is just one part of our strategy to address the problems of information overload, information obscurity, and resource scarcity. The Advanced Mobile Systems initiative is also engaged in the Edge-Enabled Programming project as well as the Resource Optimization for Mobile Platforms at the Edge project. Each project attacks the three problems of information overload, information obscurity, and resource scarcity from different perspectives. We intend to integrate each project together after they have matured, thereby providing an end-to-end solution to warfighters and first responders at the tactical edge. Additional Resources For more information on the MVC pattern, consult the Documenting Software Architectures: Views and Beyond and the Pattern-Oriented Software Architecture: Volume 1 books.

SEI . Blog .  Jul 27, 2015 02:44pm

Software Producibility for Defense

By Bill Scherlis, Chief Technology Officer (Acting)SEI The extent of software in Department of Defense (DoD) systems has increased by more than an order of magnitude every decade. This is not just because there are more systems with more software; a similar growth pattern has been exhibited within individual, long-lived military systems. In recognition of this growing software role, the Director of Defense Research and Engineering (DDR&E, now ASD(R&E)) requested the National Research Council (NRC) to undertake a study of defense software producibility, with the purpose of identifying the principal challenges and developing recommendations regarding both improvement to practice and priorities for research. The NRC appointed a committee, which I chaired, that included many individuals well known to the SEI community, including Larry Druffel, Doug Schmidt, Robert Behler, Barry Boehm, and others. After more than three years of effort—which included an intensive review and revision process—we issued our final report, Critical Code: Software Producibility for Defense. In the year and a half since the report was published, I have been asked to brief it extensively to the DoD and the Networking and Information Technology Research and Development (NITRD) communities. This blog posting, the first in a series, highlights several of the committee’s key findings, specifically focusing on three areas of identified improvements to practice—areas where the committee judged that improvements both are feasible and could substantially help the DoD to acquire, sustain, and assure software-reliant systems of all kinds. The "help" is in the form of reduced costs, greater productivity, improved schedules, and lower risks of program failure—and also in enabling the DoD to build systems with much greater levels of capability, flexibility, interlinking, and assurance. The next blog postings will cover some of the lessons learned since the report came out. Practice Improvement 1: Process and Measurement Success in developing software-dominated systems requires organizational processes that enable managers and developers to set achievable goals, analyze data, and guide decisions—and to succeed in these processes despite rapid change in operating context and in the technical and infrastructural environment. Advances related to process and measurement help facilitate broader and more effective use of incremental and iterative development methods, which have relatively short process feedback loops. These iterative approaches can better accommodate change and uncertainty. As a consequence, these approaches are commonplace in commercial and enterprise development. But for the DoD, advances in incremental and iterative development methods must account for the typical "arms-length" relationships, common in acquisition programs, that exist between contractor development teams and government stakeholders. Incremental development practices enable continuous identification and mitigation of engineering risks during a system’s development process. Engineering risks pertain to the consequences of particular choices made within an engineering process—the risks are high when the outcomes of immediate project commitments are consequential, hard to predict, and apparent only well after the commitments are made. Engineering risks may relate to many different kinds of engineering decisions—most notably architecture, quality attributes, functional characteristics, and infrastructure choices. When managed properly, incremental practices can enable successful innovative engineering without increasing the overall programmatic risk related to completing engineering projects, such as managing stakeholder expectations and triaging priorities for cost, schedule, capability, quality, and other attributes. Incremental practices help identify and mitigate engineering risks earlier in system lifecycles than traditional waterfall approaches—the feedback is sooner, and so the costs and consequences are lower. These practices are enabled through the use of diverse techniques, such as modeling, simulation, prototyping, and other means for early validation—coupled with extensions to earned-value models that measure and acknowledge the accumulating body of evidence in support of program feasibility. Incremental methods include iterative approaches (such as Agile), staged acquisition, evidence-based systems engineering, and other methods that explicitly acknowledge engineering risks and their mitigation. The committee found that incremental and iterative methods are of fundamental significance to innovative, software-reliant engineering in the DoD, and they can be managed more effectively through improvements in practices and supporting tools. The committee recommended a diverse set of improvements related to advanced incremental development practice, supporting tools, and earned-value models. Practice Improvement 2: Architecture In software-reliant DoD systems, architecture represents the earliest and often most important design decisions—those that are the hardest to change and the most critical to get right. Architecture is the principal way we address requirements related to quality attributes such as performance, security, adaptability, and the like. Architectural design also embodies expectations regarding the various dimensions of variability and change for a system. When architecture design is successful, system quality is more predictable, and change is more likely to be accommodated through smaller increments of effort rather than through wholesale restructuring of systems. Advances related to architecture practice thus contribute to our ability to build systems with demanding requirements related to quality attributes, interlinking, and planned-for flexibility. Software architecture techniques and tools model the structures of a system that comprises software components, externally visible properties of those components, and relationships among the components. Architecture thus has both structural and semantic aspects—it is not just about how components interconnect. Good architecture entails a minimum of engineering commitment that yields a maximum of business value. Architecture design is thus an engineering activity that is separate, for example, from standards-related policy setting and the certification of commercial ecosystems and components. For complex innovative DoD systems, architecture definition embodies planning for flexibility—defining and encapsulating areas (such as common operating platform environments and cyber-physical systems) where innovation, change, and competition are anticipated. Architecture definition strongly influences diverse quality attributes, ranging from availability and performance to security and isolation. It also embodies planning for the interlinking of systems to form systems-of-systems and ultra-large-scale systems and for product line development enabling encapsulation of individual innovative elements of a system. For many innovative DoD systems it is essential to consider architecture and quality attributes before making too many specific commitments to functionality. This may seem backwards from the usual model, of putting functional requirements first. But the engineering reality is that architecture includes the earliest and typically the most important design decisions: those engineering costs that are the hardest to change later. Early architectural commitment (and validation) can therefore often yield better project outcomes with less programmatic risk. The committee found that in highly complex DoD systems with emphasis on quality attributes, architecture decisions may dominate functional capability choices in overall significance. The committee also noted that architecture practice in many areas of industry is sufficiently mature for the DoD to adopt. The committee recommended that the DoD more aggressively assert architectural leadership, with an early focus on architecture being essential for systems with innovative functional or demanding quality requirements. Practice Improvement 3: Assurance and Security A significant—and growing—challenge for DoD systems is software assurance, which encompasses diverse reliability, security, robustness, safety, and other quality-related and functional attributes. The weights given these various attributes are often determined by modeling hazards associated with operational context, including potential threats and the penalties of system failure. Software assurance is very expensive—the process of achieving assurance judgments, regardless of sector, is generally recognized to account for approximately half the total development cost for major projects. Advances related to assurance and security would therefore facilitate greater mission assurance for systems at greater degrees of scale and complexity. Advances in assurance and security are particularly important to the rich supply chains and architectural ecosystems that are increasingly commonplace in modern software engineering. The growing reliance on software by the DoD has increased the functional capability of all kinds of systems, as well as a growth in the interconnectedness of systems and the extent of potential for rapid adaptation of systems. With this growth has come a dependence of DoD software-reliant systems on increasingly complex, diverse, and geographically distributed supply chains. These supply chains include not only custom components developed for specific mission purposes, but also commercial and open-source ecosystems and components, such as the widely used infrastructures for web services, cloud computing environments, mobile devices, and graphical user interaction. This places emphasis on composition and on localizing points of trust within a system. In addition to managing overall costs, the DoD faces many challenges for assurance relating to technology, practices, and incentives, including: The arms-length relationship between contractor development teams and government stakeholders also complicate the creation and sharing of information necessary to make assurance judgments. This type of relationship can lead to approaches that focus excessively on post hoc acceptance evaluation, rather than on the emerging practice of building evidence in support of an overall assurance case. Modern systems draw on components from diverse sources, implying that supply-chain and configuration-related attacks must be contemplated, with "attack surfaces" existing within an overall application, and not just at its perimeter. The consequence of this trend is that evaluative and preventive approaches should ideally be integrated throughout a complex supply chain. A particular challenge is managing black box components in a system (this issue is addressed in the full report). The growing role of DoD software in warfighting, protection of national assets, and the safe guarding of human lives creates a diminishing tolerance for faulty assurance judgments. The Defense Science Board notes that there are profound risks associated with the increasing reliance on modern software-reliant systems: "this growing dependency is a source of weakness exacerbated by the mounting size, complexity, and interconnectedness of its software programs." Losing the lead in the ability to evaluate software and prevent attacks can confer advantage to adversaries with respect to both offense and defense. It can also force the DoD to restrict functionality or performance to a level such that assurance judgments can be achieved more readily. The Defense Science Board also found "it is an essential requirement that the United States maintain advanced capability for ‘test and evaluation’ of IT products. Reputation-based or trust-based credentialing of software (‘provenance’) needs to be augmented by direct, artifact-focused means to support acceptance evaluation." Achieving this capability is a significant challenge due to the rapid advance of software technology generally, as well as the increasing pace by which potential adversaries are advancing their capabilities. This challenge—coupled with the observations above regarding software innovation—provides an important part of the rationale for the committee’s recommendation that the DoD actively and directly address its software producibility needs. The committee found that assurance is facilitated by advances in diverse aspects of software engineering practice and technology, including modeling, analysis, tools and environments, traceability and configuration management, programming languages, and process support. The committee also found that simultaneous creation of assurance-related evidence with ongoing development has high potential to improve the overall assurance of systems. The committee recommended enhancing incentives for software assurance practices and production of assurance-related evidence throughout the software lifecycle and through the software supply chain for both contractor and in-house developments. Looking Ahead The next blog posting in this series will focus on lessons learned in the many interactions subsequent to the publication of the NRC Critical Code report. I will also discuss what these lessons signify for developing software strategy for the DoD, in general, and the SEI, in particular. Additional Resources This posting is an excerpted, edited copy of an article that Bill Scherlis wrote for The Next Wave, "Critical Code: Software Producibility for Defense," which was published in Volume 19, No. 1 (2011). To request copies of the journal, please send an email to tnw@tycho.ncsc.mil. To download a PDF of the report Critical Code: Software Producibility for Defense, go towww.nap.edu/catalog.php?record_id=12979. To download a PDF of the Report of the Defense Science Board Task Force on Defense Software (2000), go to http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=ADA385923

SEI . Blog .  Jul 27, 2015 02:42pm

Quantifying Uncertainty in Early Lifecycle Cost Estimation (QUELCE): An Update

By Dave Zubrow, Chief ScientistSoftware Engineering Process Management Program By law, major defense acquisition programs are now required to prepare cost estimates earlier in the acquisition lifecycle, including pre-Milestone A, well before concrete technical information is available on the program being developed. Estimates are therefore often based on a desired capability—or even on an abstract concept—rather than a concrete technical solution plan to achieve the desired capability. Hence the role and modeling of assumptions becomes more challenging. This blog posting outlines a multi-year project on Quantifying Uncertainty in Early Lifecycle Cost Estimation (QUELCE) conducted by the SEI Software Engineering Measurement and Analysis (SEMA) team. QUELCE is a method for improving pre-Milestone A software cost estimates through research designed to improve judgment regarding uncertainty in key assumptions (which we term program change drivers), the relationships among the program change drivers, and their impact on cost. Our Approach According to a February 2011 presentation by Gary Bliss, director of Program Assessment and Root Cause Analysis, to the DoD Cost Analysis Symposium, unrealistic cost or schedule estimates are a frequent causal factor for programs breaching a performance criterion. Steve Miller, director of the Advanced Systems Cost Analysis Division of OSD Cost Analysis and Program Evaluation, noted during his DoDCAS 2012 presentation that "Measuring the range of possible cost outcomes for each option is essential …Our sense is not that the cost estimates were poorly developed [but] rather key input assumptions didn’t pan out." For instance, an estimate might assume It is possible to mature technology A from technology readiness level 4 to level 7 in three years. The program will not experience any obsolescence of parts within the next five years. Foreign military sales will support lower production costs. An interdependent program will complete its development and deployment in time for this program to use the products. We can reuse 70 percent of the code in the missile tracking system. QUELCE addresses the challenge of getting the assumptions "right" by characterizing them as uncertain events rather than certain eventualities. As we’ve noted previously, modeling uncertainty on the input side of the cost model is a hallmark of the QUELCE method. By better representing uncertainty, and therefore risk, in the assumptions and explicitly modeling them, DoD decision makers, such as Milestone Decision Authorities (MDAs) and Service Acquisition Executives (SAEs), can make more informed choices about funding programs and portfolio management. QUELCE is designed to ensure that DoD acquisition programs will be funded at levels consistent with the magnitude of risk to achieving program success, fewer and less severe program cost overruns will occur due to poor estimates, and there will be less rework reconciling program and OSD cost estimates. QUELCE relies on Bayesian Belief Network (BBN) modeling to quantify uncertainties among program change drivers as inputs to cost models. QUELCE then uses Monte Carlo simulation to generate a distribution (as opposed to a single point) for the cost estimate. In addition, QUELCE includes a DoD domain-specific method for improving expert judgment regarding the nature of uncertainty in program change drivers, their interrelationships, and eventual impact on program cost drivers. QUELCE is distinguished from other approaches to cost estimation by its ability to allow subjective inputs based solely on expert judgment, such as the identification of program change drivers and the probabilities of state changes for those drivers, as well as empirically grounded ones based on historical data, such as estimated system size and likely growth in that estimate visually depict influential relationships, scenarios, and outputs to aid team-based development, and explicit description and documentation of assumptions underlying an estimate use scenarios as a means to identify program change drivers, as well as the impacts of alternative acquisition strategies, and employ dependency matrix transformation techniques to limit the combinatorial effect of multiple interacting program change drivers for more tractable modeling and analysis The QUELCE method consists of the following steps in order: Identify program change drivers: workshop and brainstorm by experts. Identify states of program change drivers. Identify cause-and-effect relationships between program change drivers, represented as a dependency matrix. Reduce the dependency matrix to a feasible number of inter-driver relationships for modeling, using matrix transformation techniques. Construct a BBN using the reduced dependency matrix. Populate BBN nodes with conditional probabilities. Define scenarios representing nominal and alternative program execution futures by altering one or more program change driver probabilities. Select a cost estimation tool and/or cost estimating relationships (CERs) for generating the cost estimate. Obtain program estimates of size and/or other cost inputs that will not be computed by the BBN. For each selected scenario, map BBN outputs to the input parameters for the cost estimation model and run a Monte Carlo simulation. Improving the Reliability of the Expert Opinion Early cost estimates rely heavily on subject matter expert (SME) judgment, and improving the reliability of these judgments represents another focus of our research. Expert judgment can be idiosyncratic, and our aim is to try to make it more reliable. QUELCE draws upon the work of Dr. Douglas Hubbard, whose book How to Measure Anything describes a technique known as "calibrating your judgment" that we are adapting for our DoD cost estimation analysis. For example, if you state you are 90 percent confident, you should be correct in your answers 90 percent of the time. If you state you are 80 percent confident, you would be correct 8 times out of 10. Performing in agreement with your statement of confidence is termed "being calibrated." Hubbard’s technique operates by giving participants a series of questionnaires. The participants are asked to provide an upper and lower bound for the answer to each question such that they believe they will be correct 90 percent of the time. Hence, a participant should get 9 out of 10 answers right. If they answer all 10 correctly, they are being too conservative in their answers; they provided too wide of a range. If they get fewer than 9 correct, they are over confident and providing too narrow of a range for their answers. Hubbard’s approach provides feedback so that participants are consistently correct 90 percent of the time. Through this method of testing and feedback, they learn to calibrate their judgment. Applying that same approach to DoD cost estimation analysis would ideally mean that if two calibrated judgments are being applied to the same cost estimate, there is now a more precise idea of what those judgments mean. Hubbard, who taught a class at the SEI, demonstrated that most people start off being highly over confident in terms of their knowledge and judgment. We plan to test Hubbard’s approach of calibrating judgment with questions specific to software estimating at several universities, including Carnegie Mellon University and the University of Arizona. To develop the materials for these experiments, we are mining information from open-source repositories, such as Ohloh.net. Our objective is to increase the consistency and repeatability of expert judgment as it is used in software cost estimation. Addressing Challenges A key challenge that our team faces in conducting our research is validating the QUELCE method. It can literally take years for a program to reach a milestone against which we can compare its actual costs to the estimate produced by QUELCE. We are addressing this challenge by validating pieces of the method through experiments, workshops, and retrospectives. We are currently conducting a retrospective on an active program that provided us access to its historical records. Key to this latter activity is the participation of team members from the SEI Acquisition Support Program (ASP). The ASP members are playing the role of program experts as we work our way through the retrospective. Another challenge that our work on QUELCE addresses is insufficient access to DoD information and data repositories may significantly jeopardize our ability to conduct sufficient empirical analysis for the program change driver repository. To address this, we have been working with our sponsor and others in the Office of the Secretary of Defense to gain access to historical program data stored in a variety of repositories housed throughout the DOD. We plan to use this data to develop reference points and other information that will be used by QUELCE implementers as a decision aid when developing the BBN for their program. This data would also be included in the program change driver repository. Developing a Repository We are creating a program change driver repository that will be used as a support tool when applying the QUELCE method. The repository is envisioned as a source of program change drivers—what events occurred during the life of a program that directly or indirectly impacted its cost—along with their probability of occurrence. The repository will also include information that will be used as part of the method for improving the reliability of expert judgment such as reference points based on the history of Mandatory Procedures for Major Defense Acquisition Programs. Developing the repository is a major task planned for FY13. We also plan to conduct additional pilots of the method including use of the repository and support tools. From those pilots, we will develop guidance for the use of the repository and make it available on a trial basis within the DoD. After the repository is adequately populated and developed, we intend it to become an operational resource for DoD cost estimating. Transitioning to the Public During the coming year, our SEMA team will work to create guidance and procedures on how to mine program change relationships and related cost information from DoD acquisition artifacts for growth of the program change driver repository collaborate with Air Force Cost Analysis Agency to include results from analyzing Software Resources Data Report data in the program change driver repository assemble a catalog of calibrated mapping of BBN outputs to cost estimation models and make it available to the DoD cost community continue discussions with Defense Acquisition University (DAU), Service Cost Centers, and the DoD cost community about research and collaboration opportunities (for example, discussions at the DoD Cost Analysis symposium) Additional Resources To read the SEI technical report Quantifying Uncertainty in Early Lifecycle Cost Estimation (QUELCE), please visit www.sei.cmu.edu/library/abstracts/reports/11tr026.cfm. For more information about Milestone A, please see the Integrated Defense Life Cycle Chart for a picture and references in the "Article Library."

SEI . Blog .  Jul 27, 2015 02:41pm

The CERT Perl Secure Coding Standard

David Svoboda Software Security Engineer CERT Secure Coding Initiative As security specialists, we are often asked to audit software and provide expertise on secure coding practices. Our research and efforts have produced several coding standards specifically dealing with security in popular programming languages, such as C, Java, and C++. This posting describes our work on the CERT Perl Secure Coding Standard, which provides a core of well-documented and enforceable coding rules and recommendations for Perl, which is a popular scripting language. Perl is a relatively young language, only slightly older than Java. Perl became popular early in its lifetime because it was the first general-purpose scripting language on many Unix platforms. Perl enjoyed a second burst of popularity as the web became prominent because it was especially well-suited to writing Common Gateway Interface (CGI) scripts. In recent years Perl's popularity has been cemented by CPAN, a public repository of free software libraries written in Perl. Any computer with Perl installed provides straightforward mechanisms to install and use any software library from CPAN. This feature enables programmers to use libraries provided by the community easily and quickly. Several new features in Perl began life as CPAN modules before being integrated into the language. As a result of its popularity, many important software systems are written in Perl, such as the Request Tracker database (RT), an open-source project for managing tickets or bugs for a help desk, which is maintained by Best Practical Solutions. Many websites, such as amazon.com, also rely on Perl code on their servers. The CERT Perl Secure Coding standard is still young and growing. The C and Java standards have more than 200 rules in about 20 sections each. The Perl standard currently has slightly more than 30 rules in the following eight sections: Input Validation and Data Sanitization - issues dealing with data provided by an attacker, such as XML injection and cross-site scripting (XSS). Declarations and Initialization - issues dealing with securely declaring variables and functions including package versus lexical variables, name clashes, and dangers of uninitialized data. Expressions - issues dealing with Perl’s expressions syntax including list versus scalar contexts, when to use the $_ variable, and when to use the various types of comparison operators. Integers - issues dealing with numbers, such as how to specify octal numbers. Strings - issues dealing with strings and regular expressions (regexes) including the danger of providing a string literal to a subroutine that expects a regex. Object-Oriented Programming (OOP) - issues dealing with OOP are covered in this section, such as recognizing the convention of private variables. File Input and Output - issues dealing with how to safely work with files, including safely working with Perl’s filehandles. Miscellaneous - issues that don’t fall into other sections, such as handling dead code and unused variables. Addressing Security Vulnerabilities in Perl The Perl community has always prioritized practicality over theoretical elegance, and so it has always been considered an easy language to write code in—although Perl code is often considered ugly due to the tendency of some Perl developers to create "write only" programs. Perl was not designed as a secure programming language. However, problems relating to security in Perl programs have been discussed in security circles, and appear in databases such as the CERT vulnerability database. Moreover, companies that request software audits are just as likely to want Perl software audited as they are to request audits for C, C++, or Java. While the Perl community is interested in improving the language, the focus on security has historically tended to take a back seat to other priorities, such as new features and improved performance. Our work on the CERT Perl Secure Coding Standard therefore centers on addressing issues in the Perl language and libraries that deal specifically with security. The standard covers issues, such as XML injection, integer security, and proper input and output, as outlined above. By making the standard publicly accessible, we invite the Perl community to help us improve the standard. The standard leverages several sources to provide relevant material on security. For example, it takes advantage of the US-CERT vulnerability database, which contains entries on several vulnerabilities that address the Perl language or applications written in Perl. It also leverages experience gained from the Source Code Analysis Lab (SCALe), which has been used to perform security audits on several pieces of Perl code, including the previously-mentioned RequestTracker (RT) tool created by Best Practical Solutions. Other analysis tools, such as Perl::Critic, provide an automated audit of a Perl program by examining a codebase and producing a list of diagnostics. These diagnostics can range from insecure coding practices and bugs to stylistic issues. The SCALe project uses these tools to harvest the diagnostics that address security issues, while discarding diagnostics not relevant to security. The CERT Perl standard can leverage the other CERT standards for security issues that are not bound to any particular language. For instance, many issues about securely opening files on a Unix machine are language-independent. As a result, these portions of CERT standards for security issues can affect any software that runs on Unix systems regardless of the language in which it is written. While Perl has many of the same security issues that plague C and Java, several issues are unique to Perl. For example, Perl's open() function can take two arguments, with the latter argument being either a file name or a shell command. The open() function either opens the file or executes the command. If the argument begins or ends with a | (pipe) character, it is interpreted as a command to execute. Consequently, if an attacker can specify a filename to Perl's open() function—and that filename begins or ends with |—the attacker can cause Perl to execute the command for which the file is named. This issue is discussed further in rule IDS31-PL in the CERT Perl Secure Coding Standard. Perl has some technology that appears similar to other languages but presents unique problems when examined more closely. For example, C, Java, and Perl all share the concept of an array, which is a continuous vector of items that can be accessed via an index. In C and Java, arrays are fixed-size, which means they are created to hold a specific number of elements and their size remains fixed until they are destroyed. Trying to refer to an element greater than the size of the array is illegal. For example, asking for the 11th element in a 10-element array in Java will cause an exception to be thrown, which usually causes the program to crash. In contrast, Perl's arrays can grow over their lifetime. Assigning a value to the 11th element of a 10-element Perl array causes the array to grow in memory such that the array contains 11 elements, so the request becomes valid. This quality makes Perl an especially agreeable language to work with Ibecause it never reports that an array is too small. If you were to assign a value to the 1,000,000,000th element of an array,however, Perl would attempt to grow the array enough to accommodate the request and might exhaust memory. Exhausting system memory, whether deliberate or unintentional, can lead to security vulnerabilities because a system with limited memory will refuse to provide memory to any program that requests more. At the same time, many programs fail to check whether their memory requests succeeded. A machine with no free memory, therefore, is likely to have running programs that crash, either unintentionally or by design, using some sort of "out of memory" error. Consequently, the CERT Perl Secure Coding standard contains rule IDS32-PL, which forbids allowing untrusted users from providing an array index, lest they cause Perl to exhaust memory with an excessively large number. What’s Ahead for the CERT Perl Secure Coding Standard We are adding several rules each week, and presumably the Perl secure coding standard can grow to about the same size as the C or Java standards since it’s comparable in scope. We welcome your assistance in helping us complete the standard. Editor's Note: In response to feedback from our readers, this post has been edited. The post originally stated "Asking for the 11th element of a 10-element Perl array causes the array to grow in memory such that the array contains 11 elements, so the request becomes valid." As our readers pointed out in the comments section, "Simple asking is not enough." The post now states that "Assigning a value to the 11th element of a 10-element Perl array causes the array to grow in memory such that the array contains 11 elements, so the request becomes valid." Additional Resources The CERT Perl Secure Coding Standard may be viewed at https://www.securecoding.cert.org/confluence/display/perl/CERT+Perl+Secure+Coding+Standard

SEI . Blog .  Jul 27, 2015 02:40pm