Blog entry by Les Bell

Les Bell
by Les Bell - Thursday, June 29, 2023, 10:00 AM
Anyone in the world

Welcome to today's daily briefing on security news relevant to our CISSP (and other) courses. Links within stories may lead to further details in the course notes of some of our courses, and will only be accessible if you are enrolled in the corresponding course - this is a shallow ploy to encourage ongoing study. However, each item ends with a link to the original source.

News Stories


CISA & NSA Release Guidance on Defending CI/CD Pipelines

The US Cybersecurity & Infrastructure Security Agency, together with the NSA, has released an information sheet - of 23 pages - on defending the Continuous Integration / Continuous Delivery (CI/CD) environments that form an essential part of DevOps and DevSecOps.

CI/CD sits at the confluence of a number of factors: the switch to cloud environments, deployment of applications using containers and Platform-as-a-Service architectures, and the application of automation and orchestration throughout the software development lifecycle. It provides a largely straight path from the source code repository where developers check in their code, through the build, test and deployment phases right into production.

As threat actors find it harder and harder to compromise production systems, so they have shifted their attention to the earlier steps in the software supply chain, prefering to compromise source code repositories and the other steps in the CI/CD pipeline. After helpfully defining terms - not every security pro has a development background, unfortunately, and those who do often left the field before the advent of DevOps - the 'sheet' moves on to enumerating the threats (based on the OWASP Top 10 CI/CD Security Risks) and the relevant attack surfaces, including:

  • Insecure code
  • Poisoned pipeline execution
  • Insufficient pipeline access controls
  • Insecure system configuration
  • Usage of third-party services
  • Exposure of secrets

This is followed by three interesting scenarios, and then suggested mitigations to harden CI/CD environments, such as:

An appendix maps the various CI/CD threats to the MITRE ATT&CK framework tactics as well as a the D3FEND countermeasures. The information sheet also provides many links to additional guidance documents. I'd say this information sheet is required reading for the many information assurance pros whose developers have been tardy in inviting them to the DevOps party. Time to replace that with DevSecOps!

NSA and CISA, Defending Continuous Integration/Continuous Delivery (CI/CD) Environments, cybersecurity information sheet, June 2023. Available online at https://www.cisa.gov/news-events/alerts/2023/06/28/cisa-and-nsa-release-joint-guidance-defending-continuous-integrationcontinuous-delivery-cicd and https://media.defense.gov/2023/Jun/28/2003249466/-1/-1/0/CSI_DEFENDING_CI_CD_ENVIRONMENTS.PDF.

Second Thoughts About AI?

The AI hype cycle is finally dying down, as more and more stories surface of embarrassing AI disasters such as the one that befell a lawyer who filed a ChatGPT-written document citing non-existent cases. Now, a new survey from Malwarebytes has revealed the initial enthusiasm for ChatGPT has faded, with 81% of respondents concerned about possible security and safety risks, 63% not trusting the information it produces and 51% wanting to see work on it paused while regulations catch up.

If you are familiar with Gartner's hype cycle, you probably agree: we've fallen from the Peak of Inflated Expectations to the Trough of Disillusionment.

As an AI enthusiast from the early 1970's (in the Department of Cybernetics at the University of Reading) I'd say that's unfortunate: the sins of commercial large language models (LLM's) should not be transferred to the many less ambitious and less spectacular applications of machine learning and AI that are making continuous incremental progress. In large part, the inflated expectations were inevitable as the public, who do not understand the internal operation of transformers and generative LLM's, poked and prodded at the beast and were amazed when it produced lucid and articulate responses. Surely something that speaks so fluidly - and at length, too! - must actually be intelligent!

Sorry - no.

When put to work on a bounded and high-quality corpus of traning material - for example, a database of legislation and regulations - such programs can do an excellent job of tasks such as summarization and abstraction as well as answering questions which demand only limited inference. I can see them being incredibly useful in GRC applications, for example, where they could spare analysts many hours of boring reading and searching to determine the consequences of various proposed actions.

Where ChatGPT and its ilk fall down is in two areas. Firstly, they train on the public world wide web (as it existed in 2021, in the case of GPT-3). And the web is notoriously full of misinformation, disinformation and outright malinformation. And secondly, being limited purely to statistical processing of words - knowing synonyms, how frequently words appear, how often they appear in various combinations, etc. - and not their meanings, LLM's can't tell fact from fiction, so happily treat the two the same way.

And when processing case law or academic papers, for example, the LLM's record things like the types of names that appear in case titles (some companies, some inviduals, etc.) and the styles of paper titles, author names and their related institutions, reference styles, etc.. When spinning words around to create output, they will quite happily pick relevant words and spin them into plausible-sounding cases and papers - a process referred to as AI hallucination.

This process is likely to get worse, as the use of generative LLM's by news aggregation sites, and especially disinformation sites, rapidly increases on the web. As other LLM's train on this data, so a feedback loop will develop in which AI trains on AI-generated text. Can you spell, garbage in, garbage out?

A recent paper by Shumailov et. al. (with Ross Anderson, of "Security Engineering fame, as one of the co-authors) addresses this very topic, finding that "model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs. We build theoretical intuition behind the phenomenon and portray its ubiquity amongst all learned generative models".

In short, LLM's have their place, and as long as they stay in it and aren't unleashed to roam the world wide web, they are going to perform valuable work, including in security-related areas such as compliance assurance, software development (essentially a transformation problem from problem specification to executable code), automated malware analysis (explicating what machine code does and is likely intended to achieve), etc. But let's keep our expectations in check, mmmkay?

Shunailov, Ilia, The Curse of Recursion: Training on Generated Data Makes Models Forget, arXiv preprint, 31 May 2023. Available online at https://arxiv.org/abs/2305.17493v2.

Stockley, Mark, 81% concerned about ChatGPT security and safety risks, Malwarebytes survey shows, blog post, 27 June 2023. Available online at https://www.malwarebytes.com/blog/news/2023/06/chatgpt.


These news brief blog articles are collected at https://www.lesbell.com.au/blog/index.php?courseid=1. If you would prefer an RSS feed for your reader, the feed can be found at https://www.lesbell.com.au/rss/file.php/1/dd977d83ae51998b0b79799c822ac0a1/blog/user/3/rss.xml.

Creative Commons License TLP:CLEAR Copyright to linked articles is held by their individual authors or publishers. Our commentary is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License and is labeled TLP:CLEAR.

Tags: