Example - BBC News live headlines

HTML output retrieves first article from http://news.bbc.co.uk/. Output is refreshing each 15 minutes from cron.

 

Output:

<p><h3><a href="http://www.bbc.com">Egypt Christians killed in bus attack</a></h3></p>
<p>Gunmen opened fire on the bus as it headed to a church, killing more than 20 people.</p>
<p>Time: 2017-05-26T15:30:29.000Z</p>
<p>Category: Middle East</p>
<p><h3><a href="http://www.bbc.com">G7 summit agrees on countering terrorism</a></h3></p>
<p>The summit in Sicily is the first attended by President Trump, with clear differences on display.</p>
<p>Time: 2017-05-26T17:46:31.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">&#x27;Progress&#x27; in Manchester bomb inquiry</a></h3></p>
<p>Police say they have &quot;got hold of a large part&quot; of a Manchester terror network.</p>
<p>Time: 2017-05-26T18:10:34.000Z</p>
<p>Category: UK</p>
<p><h3><a href="http://www.bbc.com">Germany gets tough on child vaccination</a></h3></p>
<p>Parents could be fined up to €2,500 if they fail to see a doctor about vaccinating their children.</p>
<p>Time: 2017-05-26T15:09:02.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">What&#x27;s the French for &#x27;bromance&#x27;?</a></h3></p>
<p>France&#x27;s Emmanuel Macron strolls through a perfumed Sicilian garden with Canadian PM Justin Trudeau.</p>
<p>Time: 2017-05-26T16:48:11.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Row over Wonder Woman women-only shows</a></h3></p>
<p>Men claim discrimination after a cinema scheduled women-only screenings of the superhero film.</p>
<p>Time: 2017-05-26T12:48:39.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Scores die in Sri Lanka flooding</a></h3></p>
<p>Flooding and mudslides caused by monsoon rains leave at least 91 dead and more than 100 missing.</p>
<p>Time: 2017-05-26T14:37:07.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Disney says film hack threat was a hoax</a></h3></p>
<p>Chief executive Bob Iger said the company doesn&#x27;t believe a ransom demand was genuine as &quot;nothing has happened&quot;.</p>
<p>Time: 2017-05-26T09:13:30.000Z</p>
<p>Category: Entertainment &amp; Arts</p>
<p><h3><a href="http://www.bbc.com">UK achieves solar power record</a></h3></p>
<p>Nearly a quarter of all electricity generation came from solar power at one point on Friday.</p>
<p>Time: 2017-05-26T15:05:33.000Z</p>
<p>Category: Business</p>
<p><h3><a href="http://www.bbc.com">Terrorism &#x27;award&#x27; teacher disciplined</a></h3></p>
<p>A 13-year-old girl was handed the &quot;most likely to become a terrorist&quot; award by her teacher.</p>
<p>Time: 2017-05-26T18:20:24.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">&#x27;Seal finger&#x27; risk to sea lion girl</a></h3></p>
<p>The girl dragged into water by a sea lion in Canada is treated with antibiotics to avert infection.</p>
<p>Time: 2017-05-26T10:53:37.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Terrorism &#x27;award&#x27; teacher disciplined</a></h3></p>
<p>A 13-year-old girl was handed the &quot;most likely to become a terrorist&quot; award by her teacher.</p>
<p>Time: 2017-05-26T18:20:24.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">&#x27;Seal finger&#x27; risk to sea lion girl</a></h3></p>
<p>The girl dragged into water by a sea lion in Canada is treated with antibiotics to avert infection.</p>
<p>Time: 2017-05-26T10:53:37.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Burundi orders unmarried couples to wed</a></h3></p>
<p>The government order comes after President Pierre Nkurunziza launched a campaign &quot;to moralise society&quot;.</p>
<p>Time: 2017-05-26T15:58:27.000Z</p>
<p>Category: Africa</p>
<p><h3><a href="http://www.bbc.com">&#x27;Keep working until you&#x27;re 70&#x27;</a></h3></p>
<p>The looming pensions crisis is the financial equivalent of climate change, the World Economic Forum says.</p>
<p>Time: 2017-05-26T12:02:45.000Z</p>
<p>Category: Business</p>
<p><h3><a href="http://www.bbc.com">Girl, 11, accuses school of war crime</a></h3></p>
<p>She shot to viral fame after citing the Geneva Conventions on a school feedback form.</p>
<p>Time: 2017-05-26T13:21:43.000Z</p>
<p>Category: Scotland</p>
<p><h3><a href="http://www.bbc.com">German rock festival gets beer pipeline</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World News TV</a></h3></p>
<p>The latest global news, sport, weather and documentaries</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World Service Radio</a></h3></p>
<p>Stories from around the world</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Highs and lows of Trump&#x27;s trip so far</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Obama tees off in Scotland</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Which languages Americans learn and why</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">See the world through adventurers&#x27; eyes</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Fire at Wimbledon tennis courts</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Can Trump stop the leaks to US media?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Does military action raise terror threat?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">How I took my family on the run for 19 years</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Facebook&#x27;s tentacles reach further than you think</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The anger of China&#x27;s student patriots</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Will intelligence leaks sink US-UK relationship?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">TV show triggers Chinese virginity debate</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>

Source code of script:

# File: bbc_main.w
# Name: BBC News live headlines
# Description: HTML output retrieves first article from www.bbcnews.com
# Input: URL [http://news.bbc.co.uk]
# Output format: HTML file
# Output fields: Source URL, Link, Title, Description

#<Logger File>
#	Global
#	FileName bbc_log.log
#	Level debug
#</Logger>

<Section>
    Name bbc_main
	
    Define $output_file bbc_output.html

	# define variable $url and assign it value
    Define $url http://www.bbc.com/news
    
	
	
    # clean output file
    <Action Print>
        FileName {$output_file}
		FileMode Write  
    </Action>
    	
	
    
    # load content
    <Action ContentURL>
        URL {$url}
        RemoveNewLine
        TagsToStrip br,nobr,b
    </Action>

	# the script will iterate through all headlines
	<Section While>
		# search for headlines only in the top part of the website
		EndAt <div class="container">
		
		# match the beginning of headline
		<Pattern>
			RegExp <div class="gs-c-promo-body
		</Pattern>
	
		<Section>
			# stop searching for date before the beginning of next headline
			EndAt <div class="gel-layout__item
	
			# match url
			<Pattern>
				RegExp <a class="gs-c-promo-heading{:re([^"]*)}" href="{$url:re([^"]*)}">
				Trim
				Compact
			</Pattern>
	
			# match title
			<Pattern>
				RegExp <h3 class="gs-c-promo-heading__title{:re([^"]*)}">{$title}</h3></a>
				Trim
				Compact
			</Pattern>
	
			# match summary
			<Pattern>
				Optional
				RegExp <p class="gs-c-promo-summary{:re([^"]*)}">{$summary}</p>
				Trim
				Compact
			</Pattern>
	
			# match time
			<Pattern>
				Optional
				RegExp <time class="gs-o-bullet__text date qa-status-date" datetime="{$time:re([^"]*)}"
				Trim
				Compact
			</Pattern>
	
			# match category
			<Pattern>
				Optional
				RegExp <span aria-hidden="true">{$category}</span>
				Trim
				Compact
			</Pattern>
	
			# and print parsed data
			<Action Print>
				FileName {$output_file}
				Text <p><h3><a href="http://www.bbc.com{$link}">{$title}</a></h3></p>\n<p>{$summary}</p>\n<p>Time: {$time}</p>\n<p>Category: {$category}</p>\n
			</Action>
        </Section>
    </Section>
</Section>

Main bbc_main