Training Magazine Network

Blogs

Reflections on 20 Years of Architecture: A Presentation by Douglas C. Schmidt

By Bill PollakTransition ManagerResearch Technology & System SolutionsLast week, we presented the first posting in a series from a panel at SATURN 2012 titled "Reflections on 20 Years of Software Architecture." In her remarks on the panel summarizing the evolution of software architecture work at the SEI, Linda Northrop, director of the SEI's Research, Technology, and System Solutions (RTSS) Program, referred to the steady growth in system scale and complexity over the past two decades and the increased awareness of architecture as a primary means for achieving desired quality attributes, such as performance, reliability, evolvability, and security. It’s undeniable that the field of software architecture has grown during the past 20 years. In 2010, CNN/Money Magazine identified "software architect" as the most desirable job in the U.S. Since 2004, the SEI has trained people from more than 900 organizations in the principles and practices of software architecture, and more than 1,800 people have earned the SEI Software Architecture Professional certificate. It is widely recognized today that architecture serves as the blueprint for both the system and the project developing it, defining the work assignments that must be performed by design and implementation teams. Architecture is the primary purveyor of system quality attributes which are hard to achieve without a unifying architecture; it’s also the conceptual glue that holds every phase of projects together for their many stakeholders. This blog posting—the second in a series—provides a lightly edited transcription of a presentation by Douglas C. Schmidt, former chief technology officer of the SEI and currently a professor of computer science at Vanderbilt University, who discussed advances in software architecture practice for distributed real-time embedded systems during the past two decades. Reflections on 20 Years of Architecture for Distributed Real-time & Embedded Systems as Presented by Douglas C. Schmidt My talk focuses on advances in software architecture practice for distributed real-time embedded (DRE) systems, with a retrospective of the past 20 years in that space. The essence of a DRE system is that the right answer delivered too late becomes the wrong answer. There are many military examples of DRE systems, such as integrated air and missile defense, as well as civilian examples, such as SCADA (supervisory control and data acquisition), air-traffic management, and electronic trading systems. Twenty years ago, many engineers developing DRE systems didn’t understand what architecture meant from a software perspective. I’ll never forget the first time I went to a meeting at McDonnell-Douglas where I was employed to help develop a common software architecture for their legacy avionics mission computing systems, and I asked, "Show me your architecture?" They put up a circuit diagram, and I thought, "Boy, this is going to be a challenging project!" In legacy DRE systems, operational performance was paramount, whereas software issues were secondary, so there was much uncertainty and confusion about key software design and implementation concerns. It was hard to sort those things out 20 years ago since we lacked a common vocabulary and organizing paradigm for reasoning about software architecture for DRE systems. Perhaps due to the lack of awareness of software concerns, engineers developing DRE systems in the early- to mid-1990s tended to substitute adherence to particular standards or products for a thorough understanding of software architecture. For example, there was a push to apply standards, such as early versions of the CORBA (Common Object Request Broker Architecture) standard. Unfortunately, the CORBA specification initially de-emphasized implementation details to the point where standard CORBA object request brokers couldn’t be used effectively for DRE systems because they were woefully underspecified with respect to key quality attributes that Linda mentioned in her panel presentation. Another thing that was occurring in the early 1990s was a focus on domain-specific approaches to architecture. For example, there was a project at DARPA called Domain-Specific Software Architecture (DSSA) that examined how to structure software by focusing on domain-specific entities such as radars, trackers, launchers, etc. DSSA was a useful paradigm; in fact, it foreshadowed today’s interest in domain-specific modeling languages, model-driven engineering, and domain-driven design. There was a tendency 20 years ago, however, to overlook the underlying reusable domain-independent abstractions that weren’t adequately captured by the DSSA paradigm. These domain-independent abstractions included reactive event demultiplexers and active object concurrency models that exist at a lower level than a specific application domain. During the latter part of the 1990s, key advances in software architecture for DRE systems stemmed from the growing focus on design and architecture patterns. The classic Gang of Four patterns catalog was published in 1994, followed by the first Pattern-Oriented Software Architecture (POSA) volume in 1996. As a result, engineers developing DRE systems began adopting common vocabularies to reason intentionally about software architecture from a practitioner perspective. For example, experience gained from work at Boeing, Lockheed Martin, Raytheon, Siemens, and many telecom equipment providers showed the benefits of applying patterns to describe product lines and common software architectures for DRE systems. The growing awareness of patterns more clearly differentiated design aspects from implementation aspects. Combining this distinction with a focus on optimizing and hardening DRE systems by choosing the right patterns to implement also helped improve the quality attribute support of DRE system standards, such as Real-time CORBA and the Data Distribution Service. This experience demonstrated that pattern-oriented architectural abstractions could scale up to larger, networked topologies, yet still assure traditional real-time concerns, such as avoiding priority inversion and ensuring schedulability. By the end of the 1990s, the DRE systems community recognized that patterns enabled software developers and systems engineers to understand how fundamental abstractions could be reused in a domain-independent manner, as well as combined to support integrated higher-level domain-specific abstractions. The availability of popular, pattern-oriented, domain-independent middleware toolkits (such as ACE and TAO) in open-source form increased awareness of the value of applying patterns to production DRE systems. Starting around the year 2000, the patterns community began documenting software architectures for DRE systems more cohesively via pattern languages, which provide a vocabulary and process for the orderly resolution of software development problems. For example, the POSA4 book describes brokers, publishers/subscribers, and messaging as key architectural elements in a pattern language for distributed computing. These types of pattern languages codify, coordinate, and generate larger ensembles of software and system elements than individual patterns and pattern catalogs. In recent years, engineers have applied various domain-specific modeling languages and tools (such as the Generic Modeling Environment and associated model-driven engineering technologies) to automate various elements of pattern languages. The combination of pattern languages and model-driven tools has enabled the analysis and synthesis of frameworks and components that are more effective than earlier computer-aided software engineering (CASE) tools. So, over the past 20 years of research and transition efforts, software engineers have increasingly identified and applied patterns and pattern languages to document architectures and build DRE systems that are directly relevant to application developers and practitioners. The knowledge and technology base has matured to the point where the techniques and tools we have now provide substantial advantage over what was available 20 years ago, though they are no substitute for experience and insight. Looking ahead, I think we’re just beginning to comprehend and codify the pattern languages and architectures for the current generation of network-centric DRE systems-of-systems, as well as the next generation of ultra-large-scale (ULS) systems that Linda Northrop discussed in her panel presentation. As we reach this scale, complexity, and mission-criticality, we need the software architecture community to step forward and begin understanding and documenting the appropriate solutions in pattern form. Right now we know many anti-patterns for how not to build these types of systems, but we still struggle with how to build them at scale and reason about how they’ll work with all the quality attributes they need to succeed. What’s Next In my next blog posting, I’ll provide a transcript of presentations by Ian Gorton research and development R&D lead Data Intensive Computing at Pacific Northwest National Lab; Bob Schwanke of Siemens Corporate Research; and Jeromy Carrière, chief architect at X.commerce, a subsidiary of eBay. Additional Resources The abstract and slides for Douglas C. Schmidt’s talk are available at www.sei.cmu.edu/library/abstracts/presentations/schmidt-panel-saturn2012.cfm. Many patterns and pattern languages—as well as open-source implementations of these patterns—for DRE systems are available at www.dre.vanderbilt.edu/~schmidt/patterns.html. Douglas C. Schmidt is teaching a massively open online course (MOOC) on patterns and frameworks for concurrent and networked software for Coursera starting February, 2013. Course information and instructions for free registration are available at https://www.coursera.org/course/posa/. A Software Engineering Radio podcast describing a pattern language for distributed systems is available at www.se-radio.net/2007/07/episode-63-a-pattern-language-for-distributed-systems-with-henney-and-buschmann/. A podcast describing the applications of patterns and pattern languages to DRE systems is available at www.se-radio.net/2006/01/episode-3-interview-doug-schmidt/. SATURN 2013, presented in collaboration with IEEE Software magazine, will be held April 29 through May 3, 2013 in Minneapolis, Minnesota. For more information see the SATURN 2013 software architecture conference website at www.sei.cmu.edu/saturn/2013.

SEI . Blog .  Jul 27, 2015 02:36pm

Helping Developers Address Security with the CERT C Secure Coding Standard

By David Keaton, Senior Member of the Technical Staff CERT Secure Coding Team By analyzing vulnerability reports for the C, C++, Perl, and Java programming languages, the CERT Secure Coding Team observed that a relatively small number of programming errors leads to most vulnerabilities. Our research focuses on identifying insecure coding practices and developing secure alternatives that software programmers can use to reduce or eliminate vulnerabilities before software is deployed. In a previous post, I described our work to identify vulnerabilities that informed the revision of the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC) standard for the C programming language. The CERT Secure Coding Team has also been working on the CERT C Secure Coding Standard, which contains a set of rules and guidelines to help developers code securely. This posting describes our latest set of rules and recommendations, which aims to help developers avoid undefined and/or unexpected behavior in deployed code. History of Addressing Security Issues in C The C programming language began to take shape in 1969, long before security concerns became important for its applications. C was first standardized in 1989, too soon to take into account the then-budding security problems on the ARPANET. Due to lack of customer demand for security, even the 1999 revision of the C standard contained only one security-related feature, the snprintf() function mentioned in my previous blog post. In recent years, however, C developers have been forced to turn their attention to security issues. The CERT C Secure Coding Standard addresses this need by providing rules and recommendations for avoiding security problems in the following categories: preprocessor - issues dealing with macros declarations and initialization - choosing the right storage duration and type qualifiers, and C language rules for the uniqueness of variable names expressions - order of evaluation, safe use of C syntax integers - arithmetic issues such as avoiding integer overflow floating point - quirks of computer arithmetic that are often overlooked by people who are used to using integers arrays - allocating and communicating the correct size, and using the correct types characters and strings - ensuring that character sequences are null-terminated, and proper use of narrow and wide characters memory management - avoiding memory leaks, double free, and underallocation input output (I/O) - proper use of C’s file I/O library environment - interfacing with the operating system signals - best practices for handling asynchronous events error handling - ensuring correct detection of error conditions application programming interfaces - security-conscious design of the interfaces between parts of a program concurrency - issues that arise in multithreaded programs. miscellaneous - issues not covered by other categories, such as assertions, and maintaining the security of function pointers. POSIX - issues specific to the POSIX operating system, which is widely used with C. Examples of CERT C Secure Coding Rules The remainder of this blog posting gives some examples of the types of secure coding rules we’ve defined for C. Preprocessor macros. One part of the CERT C Secure Coding Standard focuses on the C preprocessor, which is a macro expander that executes at the beginning of the compilation process. Far too often, programmers overlook security-related consequences of preprocessor misuse. If a programmer passes an expression that has a side effect as an argument to a macro, the macro may cause the side effect to occur multiple times, depending on how it uses that argument. The CERT Secure Coding team recently developed rules to ensure that a programmer doesn’t accidentally pass an argument that slips a side effect into a macro, or that the programmer writes the code in such a way that the side effects only occur one time. While these and other rules described in this post aren’t ones that would normally be considered security risks, they have resulted in security problems when the code is deployed and activated. For example, we have developed the following rules and recommendations for developers to follow when they write code that involves the C preprocessor: Avoid side effects in arguments to unsafe macros (Identifier PRE31-C). This hard-and-fast rule states that if a developer is using a macro that uses its arguments more than once, then the developer must avoid passing any arguments with side effects to that macro. An example of the application of PRE31-C is#define ABS(x) (((x) < 0) ? -(x) : (x))/* ... */m = ABS(++n); /* undefined behavior */ Do not define unsafe macros (Identifier PRE12-C). This recommendation relates to PRE31-C that defines an unsafe macro as one that evaluates any of its arguments more than one time. Macro replacement lists should be parenthesized (Identifier PRE02-C). This recommendation suggests that developers should use parentheses around macro replacement lists; otherwise, operator precedence may cause the expression to be computed in unexpected ways. For example, if an argument contains a plus sign (+), and a macro contains a multiplication sign (*), and the argument has not been parenthesized, then the multiplication will occur first, followed by the addition, which may not be what a developer expected. An example of the application of PRE02-C is listed below:#define CUBE(X) (X) * (X) * (X)int i = 3;int a = 81 / CUBE(i); /* evaluates to 243 */ Declarations. C provides a range of mechanisms to declare data types and variables of these data types. For example, a developer might declare a variable in an outer scope and also declare another variable of the same name in a nested inner scope. In such a case, the variable in the inner scope hides the variable in the outer scope. When a developer makes changes to that variable, he or she might assume that changes are being made to the outer-scope variable when, in fact, only the inner-scope variable is being changed. The declarations problem is compounded by the fact that in C, there is a limit to how many characters are required to be unique in a variable name. The C standard has several requirements including the following: A macro name has 63 significant initial characters. If a program has two macro names and they differ only in the 64th character, the compiler is allowed to think that those are the same name. Programs also have 31 significant initial characters in an external identifier. If a program has two variables whose names differ only in the 32nd character or after, the compiler is allowed to think that those are the same variable. The situation described above can cause a problem wherein a developer has declared two variables that can inadvertently reside in the same scope. While the two variables might not have the same name in English, the compiler might truncate the name to such a degree that the names are the same. Characters and strings. C defines a set of functions that operate on strings composed of characters. It is common practice for developers to count the number of characters needed in a string and allocate exactly the same number of bytes. For example, if a developer allocates enough space to store the text version of the IPv4 address 255.255.255.255, the developer might allocate 15 bytes when he or she actually needs 16 bytes to accommodate the null terminator at the end. One additional character is needed for the null terminator, a byte whose value is zero that defines the end of the string. While languages such as Fortran store a count of how many characters the string contains as part of the string data structure, C doesn’t do that. If the marker that indicates the end of the string is missing, then the software doesn’t know that it needs to stop. Instead, it keeps searching through the memory. While the previous rules and recommendations are intended to prevent vulnerabilities that eventually lead to security problems, neglecting to include a byte for the null terminator leads directly to a buffer overflow. Future Work Since publishing the first version of the CERT C Secure Coding Standard, we’ve learned and improved our approach to vulnerability analysis and developing rules and recommendations. A new version of the CERT Secure Coding Standard will eventually be published to update the existing rules and recommendations, as well as to add some for new C features, such as the standard C multithreading library. Meanwhile, the current work has already met with success. Cisco and Oracle have adopted the CERT C Secure Coding Standard as part of their internal processes. We continue to hear of additional interest from various organizations. Additional ResourcesFor more information about the CERT Secure Coding initiative, please visithttps://www.cert.org/secure-coding/

SEI . Blog .  Jul 27, 2015 02:35pm

Writing Effective YARA Signatures to Identify Malware

By David FrenchSenior Malware ResearcherCERT In previous blog posts, I have written about applying similarity measures to malicious code to identify related files and reduce analysis expense. Another way to observe similarity in malicious code is to leverage analyst insights by identifying files that possess some property in common with a particular file of interest. One way to do this is by using YARA, an open-source project that helps researchers identify and classify malware. YARA has gained enormous popularity in recent years as a way for malware researchers and network defenders to communicate their knowledge about malicious files, from identifiers for specific families to signatures capturing common tools, techniques, and procedures (TTPs). This blog post provides guidelines for using YARA effectively, focusing on selection of objective criteria derived from malware, the type of criteria most useful in identifying related malware (including strings, resources, and functions), and guidelines for creating YARA signatures using these criteria. Benefits of Reverse Engineering for Malware Analysis Reverse engineering is arguably the most expensive form of analysis to apply to malicious files. It is also the process by which the greatest insights can be made against a particular malicious file. Since analysis time is so expensive, however, we constantly seek ways to reduce this cost or to leverage the benefits beyond the initially analyzed file. When classifying and identifying malware, therefore, it is useful to group related files together to cut down on analysis time and leverage analysis of one file against many files. To express such relationships between files, we use the concept of a "malware family", which is loosely defined as "a set of files related by objective criteria derived from the files themselves." Using this definition, we can apply different criteria to different sets of files to form a family. For example, as I described in my last blog post, Scraze is considered a malware family. In this case, the objective criteria forming the family are the functions that Scraze files have in common. Likewise, NSIS files can also be considered a family, though a reasonable person might conclude that a common installation method for multiple different programs (such as NSIS provides) is an irrelevant relationship between those programs. In this case, we may also use function sharing as the objective criteria forming the family. Whatever the criteria selected to identify a malware family, we generally find that it is desirable for these criteria to have the following properties: The criteria should be necessary to the behavior of the malware. Ideal candidates are structural properties (such as a particular section layout or resources needed for the program to run) or behavioral properties (such as function bytes for important malware behavior). The criteria should also be sufficient to distinguish the malware family from other families. We often find that malicious files accomplish similar things but perhaps in different ways. For example, many malicious files detect that they are running in a virtual environment, and there have been many published techniques on how to implement this behavior. Since these techniques are published on the Internet, it is trivial for a malware author to incorporate them into his own programs, and we expect this to occur. Thus, using these published examples as criteria indicative of one particular malware family is probably not sufficient to distinguish that family. Applying YARA Signatures to Malware Analyst experience and intuition can guide the selection of good criteria, as discussed in my prior work on the process of selecting and applying criteria. The goal of developing these criteria is to use them to identify related files. In practice, we generally find that good criteria for distinguishing and identifying malware families are excellent targets for creating signatures that identify the families. One way to encode signatures that identify malware families is by using the open source tool YARA. YARA provides a robust language (based on Perl Compatible Regular Expressions) for creating signatures with which to identify malware. These signatures are encoded as text files, which makes them easy to read and communicate with other malware analysts. Since YARA applies static signatures to binary files, the criteria statically derived from malicious files are the easiest and most effective criteria to convert into YARA signatures. We have found three different types of criteria are most suitable for YARA signature development: strings, resources, and function bytes. The simplest usage of YARA is to encode strings that appear in malicious files. The usefulness of matching strings, however, is highly dependent on which strings are chosen. For example, selecting strings that represent unique configuration items, or commands for a remote access tool, are likely to be indicative of, and specific to, a particular malware family. Conversely, strings in a malicious file that result from the way the file was created (such as version information stored by the Microsoft MSVC++ compiler) are generally poor candidates for YARA signatures. Here is an example of a YARA signature for the malware family Scraze, based on strings derived from the malware: rule Scraze{ strings: $strval1 = "C:\Windows\ScreenBlazeUpgrader.bat" $strval2 = "\ScreenBlaze.exe " condition: all of them} Another effective use of YARA is to encode resources that are stored in malicious files. These resources may include things like distinctive icons, configuration information, or even other files. To encode these resources as YARA signatures, we first extract the resources (using any available tool, for example Resource Hacker) and then convert the bytes of the resource (or a portion thereof) to a hexadecimal string that can be represented directly in a YARA signature. Icons in particular make excellent targets for such signatures. However, resources are easily modified in a program. For example, Microsoft Windows allows a program’s icon to change by simple copy/paste operations in Windows Explorer. The same care must be taken in selecting program resources as is taken when selecting strings. Finally, an effective use of YARA signatures is to encode bytes implementing a function called by the malicious program. Functions tend to satisfy our desire for criteria both necessary and sufficient to describe the malware family. The best functions to encode are those that perform some action deemed indicative of the overall character of the malware. For example, the best function for malware whose primary purpose is to download another file may be one that performs the download or processes downloaded files. Likewise, for remote access tools, it may be a function to encrypt network communications or to process received commands. For viruses, it may be code (which may not even be an actual function) involved with decrypting a payload or re-infecting additional files. We may also identify packers by encoding bytes representing the unpack stub. Here is an example of a YARA signature encoding part of a function used to check the integrity of an NSIS installer created with NSIS version 2.46. This basic block has had address bytes wildcarded based on the PIC algorithm (described in the article "Function Hashing for Malicious Code Analysis" in the 2009 CERT research report). rule NSIS_246{ strings: $NSIS_246_CheckIntegrity = { 57 53 E8 ?? ?? ?? ?? 85 C0 0F 84 ?? ?? ?? ?? 83 3D ?? ?? ?? ?? 00 75 ?? 6A 1C 8D 45 D8 53 50 E8 ?? ?? ?? ?? 8B 45 D8 A9 F0 FF FF FF 75 ?? 81 7D DC EF BE AD DE 75 ?? 81 7D E8 49 6E 73 74 75 ?? 81 7D E4 73 6F 66 74 75 ?? 81 7D E0 4E 75 6C 6C 75 ?? 09 45 08 8B 45 08 8B 0D ?? ?? ?? ?? 83 E0 02 09 05 ?? ?? ?? ?? 8B 45 F0 3B C6 89 0D ?? ?? ?? ?? 0F 8F ?? ?? ?? ?? F6 45 08 08 75 ?? F6 45 08 04 75 ?? } condition: $NSIS_246_CheckIntegrity} Caveats in Creating YARA Signatures Regardless of the criteria used to create YARA signature, there are always caveats, especially for criteria derived from program data, such as strings. For example, malware with statically coded password lists may have a large number of strings, including those that may seem to unique to a family. Moreover, since strings are easily mutable, a string (such as the filename or path into which a malicious file is installed, or common encoding strings used in HTTP requests) considered as indicative may change at any time. The analyst must assess the significance of the presence or absence of a particular string, rather than delegate responsibility of understanding to a YARA signature. Likewise, when using code instead of data to create YARA signatures, we must also be aware of the caveats. It is important to use functions that are not provided from a common implementation detail, such as common libraries that have been statically linked into the program. It is also important to account for differences in malicious files due to process changes, such as recompilation. One way to sift out these differences is to use algorithms that normalize address references (for example, the PIC algorithm) when selecting function bytes. Malicious code may change over time, so particular functions may come and go. It is therefore better to select a number of functions to encode as signatures and derive the "best" signatures by surveying matching files and refining the analysis as the importance of the signatures is revealed. This method is the essence of family analysis, and is an important application of YARA to malware analysis. Open Issues and Future Work While YARA continues to gain in popularity, there aren’t many guidelines on how to use it most effectively. The issue of false positives continues to vex malware analysts. For example, YARA signatures may match files in which the specified criteria exist and yet do not possess the same semantics as expressed in the original file. Another issue with using YARA signatures is that they can be fragile in the face of changing codebases. A malware author can and does change his/her code to suit an ever-shifting set of goals, and keeping up with these changes is particularly challenging. While YARA signatures are a powerful means of capturing and communicating analyst insights, care must be taken that they do not drift too far away from the current reality of a particular malware family. To address these open issues, our future work is to continue refining the process by which malware families may be most reliably identified. This work includes developing metrics for the best type of criteria to define families, measuring the resilience of these criteria in the face of changes, and evaluating the cost of developing the criteria vs. the cost to change the malware. We are also developing techniques to prioritize malware analysis based on such metrics, as well as triage systems to associate related files by automatically producing and testing high-quality signatures. Additional Resources: Function Hashing for Malicious Code Analysis, CERT Research Report, pp 26-29www.cert.org/research/2009research-report.pdf

SEI . Blog .  Jul 27, 2015 02:34pm

Strategic Planning: Developing Business Drivers for Performance Improvement

By Linda Parker GatesSenior Member of the Technical StaffAcquisition Support Program Organizational improvement efforts should be driven by business needs, not by the content of improvement models. While improvement models, such as the Capability Maturity Model Integration (CMMI) or the Baldrige Criteria for Performance Excellence, provide excellent guidance and best practice standards, the way in which those models are implemented must be guided by the same drivers that influence any other business decision. Business drivers are the collection of people, information, and conditions that initiate and support activities that help an organization accomplish its mission. These drivers should be the guiding force behind performance improvement because they represent key factors or influences that matter to an organization’s success. But how do we identify these drivers? This blog posting, the latest in a continuing series on the SEI’s work on strategic planning, describes how we are using integrated strategic planning and the associated information framework to derive the most vital business drivers for performance improvement. An Integrated Strategic Planning Method The strategic planning method we’ve been using at the SEI integrates the following two complementary techniques that provide a framework for identifying business drivers for performance improvement: Critical success factors (CSFs), which are indicators that measure how well an organization is accomplishing its goals. For example, a CSF for agile software projects is achieving a high-level of client-developer interaction. Future scenarios, which allow organizations to explore multiple potential futures and generate robust strategies and the early warning signs that indicate how the future may unfold. For example, weather experts will create scenarios based on the critical uncertainties associated with a major weather system and plan for the range of possibilities, while monitoring the variables and narrowing on the most likely scenario over time. Our integrated strategic planning approach (described in my February 2011 blog post and a November 2010 SEI technical report) sets the stage for initiatives, such as architecture analysis of alternatives, performance improvement, risk management, and portfolio management, that an organization can apply to improve its performance at multiple scales, ranging from individuals and teams up to the entire enterprise. Tying performance improvement to organizational strategy by identifying key business drivers creates an environment that is business-driven and model-based, but not model-driven. In other words, improvement decisions are informed by best practice models—such as the CMMI, the Baldrige Criteria for Performance Excellence, the Information Technology Infrastructure Library (ITIL), and the Project Management Body of Knowledge—but are driven by business concerns, rather than by an attempt to apply a particular model for its own sake. Linking Strategic Planning to Performance Improvement Through the SEI’s strategic planning and performance improvement work with federal acquisition program offices, we’ve observed that key business drivers can and should be elicited from integrated strategic plans. In particular, aligning improvement activities with organizational strategic goals and CSFs helps ensure improvement activities achieve business goals. We’ve also learned that there is no one-size-fits-all improvement solution, that is, no single model improves performance across-the-board. Instead, organizations often see better results when they apply the most applicable parts of multiple models based on strategic business-driven information. When improvement initiatives and activities are directly derived from organizational goals, objectives, and CSFs, they can support and complement strategic initiatives and actions. We particularly like how broad frameworks, such as the Malcolm Baldrige Criteria for Performance Excellence, can be used to identify general initiatives. The Baldrige Customer Focus criteria, considered with regard to the organization’s customer goals and coupled with input from a strategic plan, might lead the organization to improve the resilience of their customer-facing web services, which can then be augmented with specific actions (such as ensuring that high priority alerts from incident detection systems are resolved within 60 minutes) guided by the CERT Resilience Management Model. This multi-model combination enables an organization to select the improvement model(s) and practices according to what will best support their business objectives (such as preserving the confidentiality of customer data), rather than according to model-based criteria (such as maturity levels). Business-Driven Performance Improvement To showcase the way that integrated strategic planning can help an organization understand its business drivers for improvement, consider an information technology (IT) group with the mission of acquiring IT systems that support the services provided to the broader company’s customers. The IT group’s CSFs would reflect the following operational areas that must function well to meet its mission: acquiring IT systems that serve customer needs managing and tracking a budget that is adequate for the mission formally managing relationships with key internal and external stakeholders through communication, managing expectations, and personal interaction Likewise, the IT group’s strategic goals might include the following: Deliver service at customer locations, not just at traditional in-house IT data centers and facilities Enable highly usable self-service systems to maximize the ability of customers to transact business without intervention by IT group staff Develop and certify qualified, competent project managers, project team members, and portfolio and project oversight staff members to manage IT projects successfully Progress toward the goals and attention to the CSFs described above might involve improvement actions associated with maturing the IT group’s requirements definition process. Specifically, the strategic goals outlined above could lead the IT group to the Baldrige Customer Focus category for criteria on Customer Listening. This goal would also indicate value in CMMI-ACQ process areas, such as CMMI for Acquisition (CMMI-ACQ) and the following practices: ARD - Acquisition Requirements Development o Specific Practice 1: Develop Customer Requirementso Specific Practice 3: Analyze and Validate Requirementso Generic Practice 2.5: Train Peopleo Generic Practice 2.7: Identify and Involve Relevant Stakeholderso Generic Practice 3.1: Establish a Defined Process The goal might also lead the organization to practices in CMMI for Services (CMMI-SVC). Future scenarios might help expose a driver about the potential for a dramatic change in the IT group’s workforce over the next 5 to7 years. While scenarios do not present certainties, they present opportunities to develop robust strategies that will serve the organization well, regardless of the outcome of a high-impact uncertainty. Awareness of a critical uncertainty around the size of the workforce (due to economic conditions, aging, competition for talent, etc.) might help the IT group justify focus on the Baldrige Workforce Focus category for criteria on Workforce Capability and Capacity, as well as the People Capability Maturity Model (P-CMM). Ideally, the IT group will have fully aligned strategic plans and improvement plans that identify the essential knowledge and best practices contained in the performance improvement models that are most relevant for their specific business drivers. Current and Future Work on Integrated Strategic Planning We are currently working with an organization that is undertaking 48 improvement actions, each derived from organizational goals and CSFs and tied to one of four performance improvement models, including the P-CMM, CMMI-ACQ, CMMI-SVC, and the Baldrige Criteria for Performance Excellence. By aligning the work associated with the improvement models to the organization’s strategic plans, the improvement measures are tied to organizational goals and CSFs. Our work on this project thus far has provided compelling evidence that the integrated strategic planning process described in this blog posting is well-suited to identifying business drivers for performance improvement. More broadly, an organization’s store of knowledge and experience, embodied in people and captured for communication and use in the processes, practices, and procedures of the organization, help an organization react to change. In today’s dynamic mission contexts, the ability of an organization to react to changes in its mission environment is a critical capability. We are exploring the use of an agile strategic planning process that links performance improvement to organizational agility. We look forward to sharing our results with you in a forthcoming post. Additional Resources For more blog postings on strategic planning, please visit http://blog.sei.cmu.edu/archives.cfm/category/strategic-planning I recently attended the SEPG Europe conference in Madrid, Spain, where I delivered a presentation about using strategic planning techniques to identify business drivers for multi-model performance improvement. My presentation was on Tuesday. http://www.sei.cmu.edu/sepg/europe/2012/

SEI . Blog .  Jul 27, 2015 02:34pm

Reflection on 20 Years of Software Architecture: A Presentation by Robert Schwanke

By Bill PollakTransition Manager Research, Technology, & System Solutions It is widely recognized today that software architecture serves as the blueprint for both the system and the project developing it, defining the work assignments that must be performed by design and implementation teams. Architecture is the primary purveyor of system quality attributes that are hard to achieve without a unifying architecture; it’s also the conceptual glue that holds every phase of projects together for their many stakeholders. Last month, we presented two posting in a series from a panel at SATURN 2012 titled "Reflections on 20 Years of Software Architecture" that discussed the increased awareness of architecture as a primary means for achieving desired quality attributes and advances in software architecture practice for distributed real-time embedded systems during the past two decades. This blog posting—the next in the series—provides a lightly edited transcription of a presentation by Robert Schwanke, who reflected on four general problems in software architecture: modularity, systems of systems, maintainable architecture descriptions, and system architecture. Robert Schwanke, Siemens Corporate Research We’ve been using the term "software architecture" for about 20 years, but the foundations of the concept go back another 20 years, to the information-hiding principle introduced by David Parnas in 1972. So, we’ve actually had 40 years of software architecture. Parnas also talked about hierarchical structure in 1974 and data encapsulation in 1975. Some classic papers from that era are listed at the end of this article. We still lean on these and other early principles. In fact, if we look around for general principles of software architecture, there are not many new ones. But we do have important, unsolved, general problems in software architecture. Today I want to draw your attention to four problem areas: modularity, systems of systems, maintainable architecture descriptions, and system architecture. What is so hard about modularity today? According to Parnas, modules were supposed to decouple development tasks. But somewhere along the way, we got the idea that modularity is about syntactic dependency. It's not. It’s about dividing systems into separate chunks so that people can work on them independently, most of the time, not needing to talk to each other very often. To create a good decomposition, we need to know which development tasks should be decoupled, because we can’t decouple them all. Modular decomposition has to be a tree. At every node in the tree, we decide which tasks are most important to decouple and divide the subsystem into smaller pieces accordingly. Modularity is also about anticipating change. The marketplace, stakeholders, and technology can all change, altering the software’s requirements and the criteria for success. To get a perfect architecture, you must have perfect insight into the future to know what is going to change. How far into the future should you look when selecting tasks to decouple? If you look too far, you get an over-engineered system; if you don’t look far enough, your project may fail before its first delivery. My team is now working on measuring modularity. Past efforts at measuring it looked at coupling and cohesion, design similarity, and other measures, but we never really validated any of those measures—we could never show what the measurements were good for. These days we are looking at detecting modularity errors by contrasting code structure with change sets. For example, if certain pairs of files get changed together often—and there’s no syntactic explanation for why they’re being changed together—we suspect a modularity error. Modularity is supposed to keep things independent, but such pairs of files are not independent. We are combining one line of work by Yuanfang Cai at Drexel University, and another line by Alan MacCormack’s team at MIT and Harvard Business School that are both studying how to predict future change—where changes will most likely happen in the system—using structure measures and change-history measures together. Preliminary indications are that file size is still the best single predictor of future bugs. This seems intuitive—bigger files means more bugs—except that the bug density in large files turns out to be lower than in small files. Not the number of bugs—that is still higher—but the bug density is lower. Another interesting predictor, coming from social-network research, is "betweenness centrality." Centrality is how much a node in the network is in the middle of the network—specifically, the frequency with which it appears on the shortest path between any pair of nodes in the network—and it’s a pretty strong predictor of future change. The reason centrality is a good predictor is that if changes are likely to propagate through this node, the node is likely to change. Another hard problem is technology stacks. Specialization forces us to rely heavily on third-party components and technologies, and not just in ultra-large scale systems. I worked on a small system recently in which the first draft implementation, installed on my desktop, used 15 third-party technologies. By the time we delivered the system, it contained 300 open-source components, protected by 30 distinct open-source licenses, which gave us many headaches even though we hadn’t modified any of the source files. When that happens, your system loses control over aggregated quality attributes. When I was working on a VOIP (Voice Over Internet Protocol) telephone-switch product a few years ago, there were only four VOIP-switch vendors selling complete hardware and software solutions. They were all relying on third-party server hardware, specified by the VOIP vendor but sold by the server vendor directly to the VOIP customer. Servers today have a market window of 18 months, after which the vendor will change the design, typically to take advantage of new and better components. Due to that short market window, the server vendors have reduced what they spend on reliability analysis of the hardware. The telephone business was once famous for its five-nines (.99999) reliability. Not anymore, because they can afford neither to build their own servers nor to keep re-analyzing the reliability of third-party servers. There was one server vendor that was poised to take over the entire telephony server market just by offering a server with a 5-year market window and a reliability specification. The next problem is maintainable architecture descriptions. We’ve been trying for a long time to put good, useful descriptions into the hands of architects and developers and get people to maintain them. Instead, the current practice is to figure out the architecture once, document it, put it on a shelf, and never change it again. Or, actually, we do change the architecture, but the description doesn’t change, and then it’s useless. The biggest obstacle to maintainable architecture descriptions is that the subsystem tree, often reflected in the directory structure of the project, is almost enough. Much of the value of the architecture description resides in the module decomposition tree. Making the rest of the architecture description accurate, enforceable, maintainable, and usable is really hard, and we have not yet demonstrated enough of a return on the expense. Finally, there is the challenge of system architecture. We realized recently that with the way the systems engineering field now defines itself [INCOSE standard], system architecture and software architecture are almost the same thing. That is, most large systems are now dominated by software, making the software architecture and system architecture almost the same. The domain-specific physical technologies define many of the components’ quality attributes, but the software provides the integration and control that synthesizes the system qualities out of the components. So we need to worry, as software architects, that we’re about to become system architects. In our emerging role, we need to add the physical, mechanical, and electrical components to our system architectures, but more importantly, on the people side, we must develop cross-domain communication, trust, and engagement. This requires a real engineering education that most software people don’t have. Instead, we have engineers with good, practical engineering training but an inadequate appreciation of software, and software guys who understand abstraction, dependencies, modularity, and so forth, but think they can build anything, whether it’s feasible or not. We software architects probably know a lot more about system architecture, in general, but we can’t speak the language that systems engineers have been talking for decades. The next post in this series will include presentations by Jeromy Carriere and Ian Gorton. Additional Resources Schwanke’s Presentation http://www.sei.cmu.edu/about/organization/rtss/?location=tertiary-nav&source=11347 Classic papers cited in Schwanke’s Presentation Information-hiding PrincipleThe Secret History of Information Hiding by David Parnashttp://www-sst.informatik.tu-cottbus.de/~db/doc/People/Broy/Software-Pioneers/Parnas_new.pdf Hierarchical StructureOn a "buzzword": hierarchical structure by David Parnashttp://gala.cs.iastate.edu/References/Parnas_Buzzword.pdf Data EncapsulationSome conclusions from an experiment in software engineering techniques by David Parnashttp://dl.acm.org/citation.cfm?id=1480035Modularization and hierarchy in a family of operating systems by A. N. Habermann, Lawrence Flon, and Lee Coopriderhttp://dl.acm.org/citation.cfm?id=360076 Separate Dependency Specs from CodeProgramming-in-the large versus programming-in-the-small by Frank DeRemer and Hans Kron http://dl.acm.org/citation.cfm?id=808431 Module GuideThe modular structure of complex systems by Paul Clements, David Parnas, and David Weiss http://dl.acm.org/citation.cfm?id=801999

SEI . Blog .  Jul 27, 2015 02:33pm

Enabling and Measuring Early Detection of Insider Threats

By Dr. Bill Claycomb Senior Member of the Technical StaffCERT Insider Threat Center Sabotage of IT systems by employees (the so-called "inside threat") is a serious problem facing many companies today. Not only can data or computing systems be damaged, but outward-facing systems can be compromised to such an extent that customers cannot access an organization’s resources or products. Previous blog postings on the topic of insider threat have discussed mitigation patterns, controls that help identify insiders at risk of committing cyber crime, and the protection of next-generation DoD enterprise systems against insider threats through the capture, validation, and application of enterprise architectural patterns. This blog post describes our latest research in determining the indicators that insiders might demonstrate prior to attacks. The immediate and lasting impacts of IT sabotage by insiders can be catastrophic. In one case analyzed by researchers at the CERT Insider Threat Center, a disgruntled employee sabotaged IT systems hosting a child abuse hotline, preventing access to the organization’s website. Another similar case resulted in severe limitations of 911 emergency services in four major cities. In another instance, a company went out of business because an insider deleted all research data and stole all backups. Since 2001, researchers at the CERT Insider Threat Center have documented malicious insider activity by examining publicly available information such as media reports and court transcripts. We have also conducted interviews with the United States Secret Service, victims’ organizations, and convicted felons. The goals of our research have been to answer the following questions: Are there patterns an employee exhibits prior to an attack? If so, what are they? If a pattern is applied, can it distinguish malicious insiders from all others? And if so, is there a point at which the malicious insiders could have been detected or at which their behavior could have been identified as problematic? Previous research has been conducted on enabling and measuring early detection of insider threats, but several of the studies lacked access to the large collection of real-world cases our team has collected over the past 11 years. As part of our research, we focused on the timeline of events in cases of IT sabotage. Specifically, we looked at the following: the types of behavior that were observable prior to an attack patterns or models we can abstract from the saboteurs the points at which the employer could have taken action to prevent or deter the employee from executing an insider attack, either by positively mitigating the employee’s disgruntlement or by protecting IT systems. Our analysis was based on more than 50 cases of insider IT sabotage (other types of insider threat behavior include fraud and theft of intellectual property). From the selected cases, we created a chronology of events for each incident. The number of events per insider incident ranged from 5 to more than 40, with an average of 15. We began by trying to identify specific events in each case that represented key points of the incident. These key points are described as follows: Tipping Point (TP) the first observed event at which the insider clearly became disgruntledIn one case we examined, it was reported that an "insider had a dispute with management regarding salary and compensation." Malicious Act (MA) the first significant observed event that clearly enabled the attackIn the case described above, it was reported that the "insider inserted a logic bomb into production code." Attack Zero Hour (0H) when cyber damage begins to occurIn our example, the next sequence of events involved "logic bomb fires, deleting massive amounts of critical company data." Attack Detected (AD) when the organization realizes something is wrongIn our example, the "system administrator arrives at work and discovers the system is down." Attack Ends (AE) when cyber damage stops (not when recovery ends or even begins)In our example, the "system administrator finds and deletes logic bomb, preventing future damage." Action on Insider (AI) the first instance of organizational response to the insider (fired, arrested, etc.)In our example, the insider was fired and a search warrant was executed "to find missing backup tapes at insiders residence." Initial Findings After we determined which events corresponded to each key point, we analyzed each case to determine whether the events or behaviors prior to each event indicated a predisposition for sabotage. For this project, we considered predispositions to be characteristics of the individual that can contribute to the risk of behaviors leading to malicious activity. For example, one issue we examined is whether an employee—prior to the point of clear disgruntlement—demonstrated a serious mental health disorder, an addiction to drugs or alcohol, or a history of rule conflicts. Although we are still conducting analysis, two patterns have emerged: In general, insiders begin conducting attacks soon after reaching a tipping point of disgruntlement. Insiders tend to exhibit behavioral indicators prior to exhibiting technical indicators. In particular, concerning behaviors of an interpersonal nature were generally observed prior to concerning behaviors on IT systems. Addressing Challenges One of the challenges we face in our work is measuring early detection with respect to the moment of attack. Specifically, we have found the following factors particularly troublesome: Sabotage attacks are often complicated, and it’s hard to pinpoint specific event timing. Defining an attack is not simple In particular, does the attack include the time spent on planning? Is planting a logic bomb considered the attack, or does the attack begin when the logic bomb executes? Early detection times may vary according to analysis parameters, and timing fidelity is inconsistent. For some cases we have timing information down to the minute; for others, we only know the day the event occurred. Deciding which events were observable is hard. Does this mean "capable of being observed" (some system exists to observe the behavior), or "capable of being observed in each specific case" (the organization possessed and correctly utilized the tool to detect the behavior)? Measuring employee disgruntlement is hard. Behavior that indicates disgruntlement for one person may be normal behavior for another. Trying to identify a point (or set of points) in a case timeline where the insider clearly became disgruntled, or where the disgruntlement became markedly worse, is therefore highly subjective. Another difficulty we have experienced is a scarcity of detailed data. While we have several hundred cases to choose from, our sources for these cases are sometimes limited to court documents, media reports, etc., which generally do not contain detailed technical or behavioral descriptions of the insiders’ actions prior to attack. To help maintain integrity of our results and evaluation methodologies, we are collaborating with Roy Maxion from CMU’s School of Computer Science, who is well known for his research in research methodologies and data quality issues. Caveats One important factor to note is that our data is somewhat biased, as we only consider malicious insiders who have been convicted of a crime related to their insider activity, and we are limited in the scope of data sources used. The number of insiders who are not detected, or detected but not reported, is probably much greater than the number of insiders convicted. Moreover, our results are not generalizable to the entire scope of IT sabotage. They do, however, provide some of the best evidence available for researchers and practitioners to develop novel controls for preventing and detecting some types of sabotage - including the types of high-impact crimes that result in prosecution and conviction of the insider. Impact of our Work and Future Plans Through this research, we plan to equip organizations, companies, and even government agencies with improved insider threat detection capabilities. We believe that our work will be of particular relevance to programs in multiple sectors of industry and finance throughout the United States. We also hope that our findings will establish a foundation for future research. Specifically, we are interested in leveraging our findings to develop controls, such as technical and non-technical methods of preventing and detecting insider threat. Additional Resources CERT Insider Threat Blog www.cert.org/blogs/insider_threat/ Chronological Examination of Insider Threat Sabotage: Preliminary Observations presented at the International Workshop on Managing Insider Security Threats www.cert.org/archive/pdf/CERT_CodingSabotage.pdf The "Big Picture" of Insider IT Sabotage Across U.S. Critical Infrastructures, Technical Report, May 2008 www.cert.org/archive/pdf/08tr009.pdf Comparing Insider IT Sabotage and Espionage: A Model-Based Analysiswww.cert.org/archive/pdf/06tr026.pdf

SEI . Blog .  Jul 27, 2015 02:32pm

Assessing the State of the Practice of Cyber Intelligence

By Troy Townsend, Senior Analyst SEI Innovation Center The majority of research in cyber security focuses on incident response or network defense, either trying to keep the bad guys out or facilitating the isolation and clean-up when a computer is compromised. It’s hard to find a technology website that’s not touting articles on fielding better firewalls, patching operating systems, updating anti-virus signatures, and a slew of other technologies to help detect or block malicious actors from getting on your network. What’s missing from this picture is a proactive understanding of who the threats are and how they intend to use the cyber domain to get what they want. Our team of researchers—which included Andrew Mellinger, Melissa Ludwick, Jay McAllister, and Kate Ambrose Sereno—sought to help organizations bolster their cyber security posture by leveraging best practices in methodologies and technologies that provide a greater understanding of potential risks and threats in the cyber domain. This blog posting describes how we are approaching this challenge and what we have discovered thus far.Earlier this year, representatives from the government approached the SEI Innovation Center about conducting research to assess the state of the practice of cyber intelligence. Specifically, we were asked us to accomplish three core tasks: capture the state of the practice of cyber intelligence, specifically how cyber intelligence is being performed across private industry and government create an implementation framework that captures best practices and advances the state of the art prototype technology solutions that advance the state of the science The overall intent is to expose industry to the best practices in capabilities and methodologies developed by the government, and for the government to learn from the process efficiencies and tools used in industry. In areas where both the government and industry are experiencing challenges, the SEI can leverage its expertise to develop and prototype innovative technologies and processes that can benefit all participants in the program.ScopeWe identified 25 organizations to participate in our research including federal agencies international law firms universities financial sector non-profits energy sector commercial intelligence providers retail sector Our intent was to not rate participating organizations as good or bad, but rather to capture their processes, tools, and understanding of cyber intelligence as a means of enhancing cyber security. To accomplish this, we created a cyber intelligence framework that captured the core, fundamental components of a cyber intelligence process. Based on this framework, we devised interview questions researchers used to learn how organizations accomplished those core components, which we identified as: defining the cyber environment. Questions in this area—by far the broadest category—focused on the organization’s cyber footprint, their identified risks, threats, and overall cyber intelligence organization. We asked organizations about everything from the composition of their cyber intelligence component to the methods and techniques analysts use for identifying emerging cyber threats. data gathering. Data gathering is largely derived from requirements identified by defining the organization’s environment. If the environment is too broadly defined ("everything is a threat!"), then data gathering becomes inefficient, and analysts are burdened with more data than they can possibly use. If the environment is too narrowly defined, chances are the right data is not being collected, and the organization may be missing indicators of adversary activity. Questions in this area focused on data sources, tools, and processes of collecting and correlating data. functional analysis. Functional analysis is the technical assessment or niche analysis of data, which includes functions such as malware analysis, insider threat analysis, reverse engineering, supply chain, intrusion analysis, and forensics. This type of analysis tends to answer the "what is happening" and "how is this happening" questions. The primary utility in this analysis is to contribute directly to cyber security efforts and assist network defenders in remediating security gaps. strategic analysis. If functional analysis looks at the "what" and the "how," strategic analysis attempts to answer the "who is doing this" and "why are they doing this" questions. Strategic analysis often incorporates functional analysis and conveys the complexity of the technical details to leadership in a way that they can understand and appreciate, such as how a cyber event impacts the organization’s strategic goals. The Cyber Intelligence Framework Mind the GapsFor each participating organization, we applied the organization’s workflows and processes to this framework. One challenge that we identified early on in both government and industry is that a language gap exists between functional analysts and decision makers. Often, leadership and decision makers don’t understand the technical nature of the functional analysis, such as what malware is and how and why it works. In the more effective cyber intelligence programs we observed, strategic analysts are able to translate that functional data in such a manner that decision makers can understand it and use it to make smarter security and business decisions. This translation helps the organization’s leadership better understand events such as distributed denial-of-service attacks (DDoS attacks). In a DDoS attack, the functional analysis provides technical details of the attack, including its provenance and effects on the server. Strategic analysis then applies those functional details to a broader view of how the DDoS attack impacts an organization’s business, how much money was lost as a result, and what could have been done to prevent it. Initial Findings In December, we plan to present the results of our findings to our customer. We will then begin working with organizations to address the challenges that we identified and incorporate best practices into their operations. Some of the initial challenges that we identified include a lack of consistent training for the strategic analysis role. Cyber intelligence is a relatively new area of expertise, and there is a dearth of senior mentors that can provide guidance to new analysts. Moreover, a consistent method of training has not been developed and the skills necessary to perform this function are not yet well defined. There are ongoing attempts to address these inconsistencies, and we plan to share our data to help professionalize the tradecraft of cyber intelligence. reliance on traditional intelligence methodologies. Intelligence methodologies were developed in an era when governments were looking at the inventories of the tanks, missiles, and airplanes held by hostile countries and predicting what the leaders of those countries planned to do with them. Applying these same processes, workflows, and tradecraft to the cyber domain is not always feasible. Technology changes so fast in the cyber domain that by the time a strategic-level product on an emerging threat makes it through the publication process, it’s already out of date. data gluttony. When we looked at the data gathering phase, in particular, we realized that organizations were inundated with data. Some organizations collected so much data that they simply discarded it without looking at it. Other organizations saved the data but did not use it effectively, and it continued to worthlessly accumulate in their servers. Other organizations collected data, but failed to correlate it. For example, an organization that requires employees to badge in at work may not be cross-checking those badge logs against employees who are logging in to the network remotely with a virtual private network (VPN). It stands to reason that an incident where an employee that has badged in to work but is also logged creating a VPN session from overseas may warrant some investigation. Future Work Our work thus far has focused on helping government leaders make smarter investments of the resources they use to secure the cyber infrastructure. In the coming months, after presenting our data to our sponsor, we will work with participating organizations to apply the best practices that we identified across their organizations. We’ve received permission from our sponsor to publish our results, so we intend to publish an SEI state of the practice report on cyber intelligence. In addition to the SEI state of the practice report, we aim to present our findings to a broader audience through presentations and panel discussions hosted by professional associations and Information Security conferences around. For the coming year, we have received further sponsorship to develop prototype solutions to address some of the challenges we identified in this phase of our research. In January, we will begin working with engineers and participants from the study to develop and pilot these prototypes. Our work on the Cyber Intelligence Tradecraft Project won’t be the silver bullet solution to everyone’s cyber security problems. Instead, we hope that our research is a significant voice in an on-going conversation of how cyber intelligence analysis benefits risk mitigation and resource allocation in the cyber environment. We welcome your input to this conversation too! Please add your comments in the section below. Additional Resources For information on the SEI Innovation Center, please visitwww.sei.cmu.edu/about/organization/innovationcenter/For information on the Atlantic Council History of Cyber Intelligence, please visithttp://ctovision.com/2012/10/lessons-from-our-cyber-past-history-of-cyber-intelligence/For another perspective on the value of cyber intelligence from RSA, please see http://blogs.rsa.com/stalking-the-kill-chain-position-before-submission/

SEI . Blog .  Jul 27, 2015 02:31pm

Ultimate Architecture Enforcement: Prevent Code Violations at Code-Commit Time

By Paulo Merson, Visiting ScientistResearch, Technology, & System Solutions Occasionally this blog will highlight different posts from the SEI blogosphere. Today’s post by Paulo Merson, a senior member of the technical staff in the SEI’s Research, Technology, and System Solutions Program, is from the SATURN Network blog. This post explores Merson’s experience using Checkstyle and pre-commit hooks on Subversion to verify the conformance between code and architecture. Read More....

SEI . Blog .  Jul 27, 2015 02:30pm

2012: The Research Year in Review

By Douglas C. Schmidt Principal Researcher As part of our mission to advance the practice of software engineering and cybersecurity through research and technology transition, our work focuses on ensuring the development and operation of software-reliant Department of Defense (DoD) systems with predictable and improved quality, schedule, and cost. To achieve this mission, the SEI conducts research and development (R&D) activities involving the DoD, federal agencies, industry, and academia. As we look back on 2012, this blog posting highlights our many R&D accomplishments. Our R&D benefits the DoD and other sponsors by identifying and solving key technical challenges facing developers and managers of current and future software-reliant systems. R&D work at the SEI focuses on the following four general areas of software engineering and cybersecurity: Securing the cyber infrastructure. This area focuses on enabling informed trust and confidence in using information and communication technology to ensure a securely connected world to protect and sustain vital U.S. cyber assets and services in the face of full-spectrum attacks from sophisticated adversaries. Advancing disciplined methods for engineering software. This area focuses on improving the availability, affordability, and sustainability of software-reliant systems through data-driven models, measurement, and management methods to reduce the cost, acquisition time, and risk of our major defense acquisition programs. Accelerating assured software delivery and sustainment for the mission. This area focuses on ensuring predictable mission performance in the acquisition, operation, and sustainment of software-reliant systems to expedite delivery of technical capabilities to win the current fight. Innovating software for competitive advantage. This area focuses on producing innovations that revolutionize development of assured software-reliant systems to maintain the U.S. competitive edge in software technologies vital to national security. Following is a sampling of the SEI’s R&D accomplishments in each of these areas during 2012, with links to additional information about these projects. Securing the Cyber Infrastructure A large percentage of cybersecurity attacks against DoD and other government organizations are caused by disgruntled, greedy, or subversive insiders, employees, or contractors with access to that organization’s network systems or data. Over the past 12 years, researchers at the CERT Insider Threat Center have collected incidents related to malicious activity by insiders from a number of sources, including media reports, the courts, the United States Secret Service, victim organizations, and interviews with convicted felons. The blog post Developing Controls to Prevent Theft of Intellectual Property described controls that researchers have developed to prevent, identify, or detect theft of intellectual property. A subsequent posting, A New SIEM Signature Developed to Address Insider Threats, explored controls developed to prevent, identify, or detect IT sabotage. Another aspect of insider threat research focused on identifying enterprise architecture patterns that protect an organization’s systems from malicious insiders. Enterprise architecture patterns are organizational patterns that involve the full scope of enterprise architecture concerns, including people, processes, technology, and facilities. Our goal with this pattern work is to equip organizations with the tools necessary to institute controls that will reduce the incidence of insider compromise. A blog post described research to create and validate an insider threat mitigation pattern language that focuses on helping organizations balance the cost of security controls with the risk of insider compromise. A final post described exploratory research to determine the indicators that insiders might demonstrate prior to attacks. New malicious code analysis techniques and tools being developed at the SEI will better counter and exploit adversarial use of information and communication technologies. Through our work in cybersecurity, we have amassed millions of pieces of malicious software in a large malware database. Analyzing this code manually for potential similarities and identifying malware provenance is a painstaking process. The blog post Modeling Malware with Suffix Trees outlined a method to create effective and efficient tools that analysts can use to identify malware more effectively. Another approach for identifying similarity in malicious code involves leveraging analyst insights by identifying files that possess some property in common with a particular file of interest. One way to do this is by using YARA, an open-source project that helps researchers identify and classify malware. YARA has gained enormous popularity in recent years as a way for malware researchers and network defenders to communicate their knowledge about malicious files, from identifiers for specific families to signatures capturing common tools, techniques, and procedures. The post Writing Effective YARA Signatures to Identify Malware provided guidelines for using YARA effectively, focusing on selection of objective criteria derived from malware, the type of criteria most useful in identifying related malware (including strings, resources, and functions), and guidelines for creating YARA signatures using these criteria. Our security experts in the CERT Program are often called upon to audit software and provide expertise on secure coding practices. The blog posting Improving Security in the Latest C Programming Language Standard detailed security enhancements— bounds-checking interfaces and analyzability—from the December 2011 revision of the C programming language standard, which is known informally as C11. Another post described our work on the CERT Perl Secure Coding Standard, which provides a core of well-documented and enforceable coding rules and recommendations for Perl, which is a popular scripting language. Advancing Disciplined Methods for Engineering Software According to a February 2011 presentation by Gary Bliss, director of Program Assessment and Root Cause Analysis, to the DoD Cost Analysis Symposium, unrealistic cost or schedule estimates frequently cause a program to breach a performance criterion. To help the DoD address this need, the SEI has continued its research into improving the accuracy of early cost estimates (whether for a DoD acquisition program or commercial product development) and ease the burden of additional re-estimations during a program’s lifecycle. The blog posting Quantifying Uncertainty in Early Lifecycle Cost Estimation (QUELCE) outlines a multi-year project conducted by the SEI Software Engineering Measurement and Analysis (SEMA) team. QUELCE is a method for improving pre-Milestone A software cost estimates through research designed to improve judgment regarding uncertainty in key assumptions (which are termed program change drivers), the relationships among the program change drivers, and their impact on cost. QUELCE asks domain experts to provide judgment not only on uncertain cost factors for a nominal program execution scenario, but also for the drivers of cost factors across a set of anticipated scenarios. A second installment in the series on QUELCE described efforts to improve the accuracy and reliability of expert judgment within this expanded role of early lifecycle cost estimation. On a separate front, a series of blog posts detailed a pilot of the Team Software Process (TSP) approach at Nedbank, one of the four largest banks in South Africa. The first post described how TSP principles allowed developers at the bank to address challenges, improve productivity, and thrive in an agile environment. The second post detailed how the SEI worked with Nedbank to address challenges with expanding and scaling the use of TSP at an organizational level. The third post explored challenges common to many organizations seeking to improve performance and become more agile and concluded by demonstrating how SEI researchers addressed these challenges in the TSP rollout at Nedbank. The TSP pilot teams at Nedbank made significant behavioral changes that not only improved the quality of the software but also team members' work lives by decreasing the need for evening and weekend overtime. The teams were able to make these improvements because they had project-specific measurements to guide their decisions, and they had the authority to implement those decisions. Based on the results of the pilots, Nedbank decided to implement TSP throughout the organization. Accelerating Assured Software Delivery and Sustainment for the Mission Another area of exploration for the SEI was methods and processes that enable large-scale software-reliant DoD systems to innovate rapidly and adapt products and systems to emerging needs within compressed time frames. The SEI has focused research efforts on improving the overall value delivered to users by strategically managing technical debt, which involves decisions made to defer necessary work during the planning or execution of a software project, as well as describing the level of skill needed to develop software using Agile for DoD acquisition programs and the importance of maintaining strong competency in a core set of software engineering processes. A blog post on this topic discussed how an architecture-focused analysis approach helps manage technical debt by enabling software engineers to decide the best time to rearchitect—in other words, to pay down technical debt. SEI researchers also focused their efforts on common problems faced by acquisition programs related to the development of IT systems, including communications, command, and control; avionics; and electronic warfare systems. Long development cycles don’t fit with user expectations, and in the DoD, it can take up to 81 months to acquire or develop a new technology. To help bridge this gap, the SEI in 2012 hosted the Agile Research Forum, which brought together researchers and practitioners from around the world to discuss when and how to best apply agile methods in the mission-critical environments found in government and many industries. A multi-part series of blog posts highlighted key ideas and issues addressed at the forum, including Applying Agile at-Scale for Mission-Critical Software-Reliant Systems. This post, which recapped presentations by Anita Carleton, director of the SEI’s Software Engineering Process Management Program and Teresa M. Takai, chief information officer for the DoD, highlighted key ideas and issues associated with applying agile methods to address the challenges of complexity, exacting regulations, and schedule pressures. Agile Methods: Tools, Techniques, and Practices for the DoD Community. This post, which recapped a presentation by Mary Ann Lapham, highlighted the importance of collaboration with end users, as well as among cross-functional teams, to facilitate the adoption of agile approaches into DoD acquisition programs. Strategic Management of Architectural Technical Debt. This post, which recapped a presentation by Ipek Ozkaya, discussed the use of agile architecture practices to manage strategic, intentional technical debt. Balancing Agility and Discipline at Scale. This post, which recapped a presentation by James Over, manager of TSP, advocated the building of self-managed teams, planning and measuring project process, designing before building, and making quality the top priority, among other principles associated with applying agile methods at-scale. Applying Agility to Common Operating Platform Environment Initiatives. This post, which recapped a presentation I gave at the forum, highlighted the importance of applying agile methods to common operating platform environments (COPEs) that have become increasingly important for the DoD to deliver enhanced integrated warfighting capability at lower cost, reduce acquisition and new technology insertion cycle time, and establish sustainable business and workforce strategies to support these goals. One problem that occurs when organizations are trying to adopt new practices is a disconnect in business strategy, values, and management style. These aren’t the only disconnects, but they are representative of what we often see when working with DoD or other regulated organizations trying to adopt agile methods. In the first installment in an upcoming series of blog posts, we introduced an analysis method called Readiness and Fit Analysis (RFA) that has been used for multiple technologies and sets of practices, most notably for adoption of CMMI practices, to identify and suggest strategies for mitigating common adoption risks. Engineering the architecture for a large and complex system is a hard, lengthy, and complex undertaking. System architects must perform many tasks and use many techniques if they are to create a sufficient set of architectural models and related documents that are complete, consistent, correct, unambiguous, verifiable, usable, and useful to the architecture’s many stakeholders. The SEI has developed the Method Framework for Engineering System Architectures (MFESA), which is a situational process engineering framework for developing system-specific methods to engineer system architectures. The first post in a two-part series on MFESA provided a brief historical description of situational method engineering, explained why no single system architectural engineering method is adequate, and introduced MFESA via a top-level overview of its components, describing its applicability, and explaining how it simultaneously provides the benefits of standardization and flexibility. The second post in the series took a deeper dive into the four components that comprise MFESA: the MFESA ontology, which defines the foundational concepts underlying system architecture engineering the MFESA metamodel, which defines the base superclasses of method components the MFESA repository, which defines reusable method components the MFESA metamethod, which defines a process for creating project-specific methods using method components from the MFESA repository Innovating Software for Competitive Advantage For more than 10 years, scientists, researchers, and engineers used the TeraGrid supercomputer network funded by the National Science Foundation (NSF) to conduct advanced computational science. The SEI recently joined a partnership of 17 organizations and helped develop the successor to the TeraGrid called the Extreme Science and Engineering Discovery Environment (XSEDE). The first post in a multi-part series focused on the nature of software engineering practice in the context of the TeraGrid socio-technical ecosystem. From war zones in Afghanistan to disaster relief in countries like Haiti and Japan, today’s warfighter faces many challenges. The Advanced Mobile Systems Initiative at the SEI continues to focus on meeting the needs of the warfighter at the tactical edge, which is a term used to describe hostile environments with limited resources. Warfighters who use handheld devices face problems ranging from information obscurity (i.e., a lack of awareness of the available information) to information overload (i.e., too much information, coupled with an inability to locate truly vital information.). Through the development of context-aware mobile applications, which was described in a recent blog post, SEI researchers are exploring alternative sources of data that would not only push the limit of what could be done with user context, but also focus on the extremely challenging environment at the tactical edge. Another research effort is aimed at addressing a common phenomenon that takes place when the software development and maintenance effort involves several programmers and spans months or years: the source code exhibits an actual architecture that gradually diverges from the intended architecture. A post on the SATURN Network Blog describes one researcher’s experience using Checkstyle and pre-commit hooks on Subversion to verify the conformance between code and architecture. Concluding Remarks As you can see from this summary of accomplishments, 2012 has been a highly productive and exciting year for the SEI technical staff. Moreover, this blog posting just scratches the surface of SEI R&D activities. Please come back regularly to the SEI blog for coverage of these and many other topics we’ll be doing in the coming year. As always, we’re interested in new insights and new opportunities to partner on emerging technologies and interests. We welcome your feedback and look forward to engaging with you on the blog, so please feel free to add your comments below. Additional Resources For the latest SEI technical reports and papers, please visitwww.sei.cmu.edu/library/reportspapers.cfm For more information about R&D at the SEI as well as opportunities for collaboration, please visitwww.sei.cmu.edu/research/

SEI . Blog .  Jul 27, 2015 02:29pm

Semantic Comparison of Malware Functions

By Sagar Chaki, Senior Member of the Technical StaffResearch, Technology & System Solutions A malicious program disrupts computer operations, gains access to private computational resources, or collects sensitive information. In February 2012, nearly 300 million malicious programs were detected, according to a report compiled by SECURELIST. To help organizations protect against malware, I and other researchers at the SEI have focused our efforts on trying to determine the origin of the malware. In particular, I’ve recently worked with my colleagues—Arie Gurfinkel, who works with me in the SEI’s Research, Technology, & System Solutions Program, and Cory Cohen, a malware analyst with the CERT Program—to use the semantics of programming languages to determine the origin of malware. This blog post describes our exploratory research to derive precise and timely actionable intelligence to understand and respond to malware. In a previous blog post, I described our efforts to use classification (a form of machine learning) to detect provenance similarities in binaries. Broadly, two functions are provenance similar if they have been compiled from similar source code using similar compilers. Malware programmers often draw upon similar source code with minor differences, different compilers (such as various versions of Microsoft Visual C++), or different levels of optimization. In creating a training set to learn (or train) a classifier to predict the similarity of binaries, we realized several positive results including high accuracy and automation in performing classification. We also recognized, however, the following limitations: Machine learning is not 100 percent precise. In some instances the classifier reported that two binaries were similar that weren’t or vice versa. In a lab environment, machine learning is hard to validate at scale due to the substantial time commitment of researchers who must manually check thousands of samples. Our approach was limited to pairwise comparison (detecting whether any given pair of malware/functions are similar), which impedes scalability to full production-size data sets since detecting similarity among a set of size N requires O(N2) comparisons. Another approach that we researched, computing and comparing attribute vectors, also yielded low rates of accuracy and did not capture semantics. Our New Approach Our new approach is based on extending traditional syntactic clustering techniques that have been used to classify malware in the past with execution semantics. This approach involves two tasks: function clustering based on semantic hashes semantic difference analysis of functions We summarize each approach below. Function Clustering Based on Semantic Hashes The first task of our work—function clustering based on semantic hashes—is based on semantic reasoning about programs, which is also called static analysis. We chose this approach because semantic summaries significantly reduce the number of clusters compared to syntactic hash-based clustering (For more information on the Pithos signature, please read the article by Cory Cohen and Jeffrey Havilla on page 28 of the 2009 CERT Research Annual Report) while maintaining the quality of the cluster. Our approach involves using static analysis to determine the effect of a function (i.e., some relationship between its inputs and outputs) and then mapping functions that have the same behavior to the same equivalence classes. We thus determine that two functions are similar if we arrive at a similar result when statically computing the effect of each function.For example, the following functions can be identified as similar two functions that modify the same variable two functions that output the same value for the same input close outputs for close inputs (If a final input/output is 5 and another input output is 4, we’ll say that’s "close." If one output is 10 and one is 20, we’ll say "far.") Another aspect of this first task involves validating the static analysis and constructing a semantic hash or equivalence class. The validation involves constructing a benchmark from the CERT malware database and then performing clustering on that benchmark. There are many choices we can make, so we need to find the ones that actually work. This approach allows us to avoid the inefficient O(N2) pair-wise comparison problem. In particular, equivalence classes are determined by hashing every binary/function individually and determining which ones have the same hash, not by comparing every pair of binaries/functions. Semantic Difference Analysis of Functions The second task in our approach focused on solving the pairing problem by semantically comparing two functions using a technique called regression verification, which converts the problem of comparing two functions to that of checking the equivalence of their input-output relationship expressed as logical formulas. This approach has been applied to other domains and problems including checking the equivalence of two programs available as source code. We wanted to broaden its scope by applying regression verification to binaries, which are low-level, executable code. To accomplish this task, we relied upon a platform called ROSE, which is a framework for handling binaries, as well as a binary analysis platform. The ROSE framework provides disassembly, instruction semantics, and other static analysis capabilities for us to build our semantic function comparison technique. Pairing our first task with semantic hashing allowed us to verify that our approach worked. In other words, we used the second technique to verify the first technique. The application of regression verification eliminates the need for researchers to spend countless hours manually verifying the correctness of every pair. It also provides a mechanism by which we could be reasonably confident in an automated method for verifying the correctness of pairs. Combining the Two Tasks Semantic hashing provides a precise—but rough—idea of similarity among functions. For example, given a million functions, semantic hashing could identify 10,000 groups each with 10 functions. In a particular group with 10 functions, all functions may not be similar. Researchers can then do a pairwise comparison, however, because each group is small in size. With 10 functions, researchers would have 45 pairs to check. Researchers can then apply regression verification (the second task) to take a closer look at each pair and prune out and refine each equivalence class. Mitigating Risks One challenge that we face is applying regression verification to binaries. While prior research has applied this technique to compare source code, applying it to binaries is non-trivial due to low-level details (e.g., bit-precise semantics that treat registers as 32-bit values instead of infinite-domain integers) and lack of structure (e.g., much harder to compute accurate control-flow graphs). Our approach to overcome this challenge consists of two parts: (i) logically partition system memory into three parts - the stack, the heap, and the region where parameters passed to functions are stored; and (ii) develop a semantics of executable code in terms of its effect on these three partitions of system memory. One risk we’ve identified is that the application of regression verification might not work well on very small functions since many have no inherent behavior (e.g., they wrap more complex functions), and they are trivially equivalent if we just consider their bodies. To mitigate that risk, we identified and pruned small functions from summarization. Our expectation is that this will have a marginal effect on analysis since small functions are not very helpful to analysts in general. Note that obfuscations that split large complex functions into many smaller ones are beyond the scope of this project; we assume that there are orthogonal deobfuscation techniques that deal with such issues. A second risk that we’ve identified is that the cost of computing precise semantic summaries may be expensive. To mitigate this risk, our approach favored scalability over precision. We reasoned that it is better to create a scalable solution that yields precise summaries for 80 percent of functions than a non-scalable approach that could yield precise summaries for 99 percent of functions but only works on 10 percent of functions in a reasonable time. Impact to DoD Through this research we hope to reduce the amount of manual malware analysis that is required by the organizations seeking to protect valuable information system assets. Reducing this analysis will lead to more cost-effective identifications and more timely responses to intrusions. This approach will also provide increased visibility into intruder behavior, leading to more effective defense against future intrusions, which are of increasing concern. Initial Results/Future Work We have implemented a tool (on top of ROSE) that computes two different types of semantic hashes, and empirically evaluated them on a benchmark derived from the CERT artifact catalog. Our paper, "Binary Function Clustering using Semantic Hashes," describing this research will appear in the Proceedings of the 11th International Conference on Machine Learning and Applications. Additional Resources To read the paper Binary Function Clustering Using Semantic Hashes, Wesley Jin, Sagar Chaki, Cory Cohen, Arie Gurfinkel, Jeffrey Havrilla, Charles Hines, Priya Narasimhan, Proceedings of the 11th International Conference on Machine Learning and Applications (ICMLA), December 12 to 15, 2012. To read the paper Supervised learning for provenance-similarity of binaries by Sagar Chaki, Cory Cohen, & Arie Gurfinkel, please visithttp://dl.acm.org/citation.cfm?id=2020419 To read the paper Regression verification by Benny Godlin & Ofer Strichman, please visit http://dl.acm.org/citation.cfm?doid=1629911.1630034 To read about the research on Function Hashing for Malicious Code Analysis by CERT analysts Cory Cohen and Jeffrey Havilla, please see page 28 of the CERT Research Annual Report, which can be read at www.cert.org/research/2009research-report.pdf

SEI . Blog .  Jul 27, 2015 02:29pm

Displaying 29211 - 29220 of 43689 total records

Blogs

Alert Others