Excluding parts of a webpage from AI stories

When you set up a news source in Audio.co, you have the option to "scrape" the webpage from which a story originated if the RSS feed only includes excerpts.

However, with webpage scraping enabled in your source, any paragraph text on an article's webpage may be incorporated into the generated content, including related stories, page footers or metadata like publication dates and authors.

There are a few options to help Audio.co process these stories without additional data:

Selecting page content

Audio.co allows you to add "detail page HTML selectors", commonly known as CSS selectors, which the AI will focus on when scraping page content.

You can use your browser's "Inspect Element" tool to identify the selector which contains the content you want to focus on.

For example, if the body of the news story has a CSS ID, you could enter #id . Alternatively, if it's a standard paragraph with the class "article", you could enter p.article .

Once you know the CSS selector you want to focus on, go to your source's settings page and scroll down to "detail page HTML selectors".

Ignoring page content

If your story page doesn't use any useful CSS selectors to contain the story content, you can choose CSS elements on the page to ignore instead.

If your AI stories pull in undesired page content, open the page and inspect the element in your browser. It will hopefully look something like this:

<div class="extra-content">This unwanted text is appearing in our AI stories.</div>

In this example, the undesired content is in a div with the class "extra-content".

Back in Audio.co, scroll down to "detail page ignore classes" on your source's settings page and enter the CSS classes you wish to ignore.

Hiding the related stories, page footer and story metadata from a news article.