Example - BBC News live headlines

HTML output retrieves first article from http://news.bbc.co.uk/. Output is refreshing each 15 minutes from cron.

 

Output:

<p><h3><a href="http://www.bbc.com">Shooting at Israeli embassy in Jordan</a></h3></p>
<p>Two Jordanians are killed and an Israeli is wounded during the incident in Amman.</p>
<p>Time: 2017-07-24T00:00:48.000Z</p>
<p>Category: Middle East</p>
<p><h3><a href="http://www.bbc.com">White House uncertainty over Russia bill</a></h3></p>
<p>US Congress is poised to approve new sanctions on Russia but President Trump&#x27;s stance is unclear.</p>
<p>Time: 2017-07-23T17:43:37.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Rio protest over Brazil police deaths</a></h3></p>
<p>Relatives demand more support from the authorities after the deaths of 91 officers in 2017.</p>
<p>Time: 2017-07-23T20:54:18.000Z</p>
<p>Category: Latin America &amp; Caribbean</p>
<p><h3><a href="http://www.bbc.com">The pop star linked to Donald Trump</a></h3></p>
<p>Emin Agalarov is embroiled in the storm over alleged links between Donald Trump and Russia.</p>
<p>Time: 2017-07-23T18:24:21.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Texas truck death toll rises to nine</a></h3></p>
<p>They were among 38 people found inside the back of a trailer, many suffering from dehydration.</p>
<p>Time: 2017-07-23T22:19:43.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">England beat India to win Women&#x27;s World Cup</a></h3></p>
<p>Anya Shrubsole takes 6-46 as England win the Women&#x27;s World Cup in a pulsating final at Lord&#x27;s.</p>
<p>Time: 2017-07-23T16:50:35.000Z</p>
<p>Category: BBC Sport</p>
<p><h3><a href="http://www.bbc.com">Turkish journalists face terror trial</a></h3></p>
<p>If found guilty, the 17 writers and managers could be jailed for decades.</p>
<p>Time: 2017-07-24T00:01:31.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">UK and US to start trade deal talks</a></h3></p>
<p>Liam Fox says it is too early to say exactly what would be covered in a potential post-Brexit deal.</p>
<p>Time: 2017-07-23T23:06:50.000Z</p>
<p>Category: Business</p>
<p><h3><a href="http://www.bbc.com">Princess Di &#x27;a total kid&#x27; say Harry</a></h3></p>
<p>Prince Harry says his mother, Princess Diana, was &quot;one of the naughtiest parents&quot;.</p>
<p>Time: 2017-07-23T01:58:10.000Z</p>
<p>Category: UK</p>
<p><h3><a href="http://www.bbc.com">World&#x27;s first floating wind farm emerges</a></h3></p>
<p>The revolutionary technology allows wind power to be harvested in waters too deep for current turbines.</p>
<p>Time: 2017-07-23T19:31:12.000Z</p>
<p>Category: Business</p>
<p><h3><a href="http://www.bbc.com">&#x27;Backlash&#x27; over Charlie Gard threats</a></h3></p>
<p>Charlie Gard&#x27;s parents say they have faced a &quot;backlash&quot; after GOSH said staff had been threatened.</p>
<p>Time: 2017-07-23T21:38:06.000Z</p>
<p>Category: London</p>
<p><h3><a href="http://www.bbc.com">World&#x27;s first floating wind farm emerges</a></h3></p>
<p>The revolutionary technology allows wind power to be harvested in waters too deep for current turbines.</p>
<p>Time: 2017-07-23T19:31:12.000Z</p>
<p>Category: Business</p>
<p><h3><a href="http://www.bbc.com">&#x27;Backlash&#x27; over Charlie Gard threats</a></h3></p>
<p>Charlie Gard&#x27;s parents say they have faced a &quot;backlash&quot; after GOSH said staff had been threatened.</p>
<p>Time: 2017-07-23T21:38:06.000Z</p>
<p>Category: London</p>
<p><h3><a href="http://www.bbc.com">Vietnam halts South China Sea drilling</a></h3></p>
<p>A gas operation in the disputed area has come to a forced stop after alleged threats from China.</p>
<p>Time: 2017-07-23T23:57:10.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Stars urge BBC &#x27;to sort gender pay gap&#x27;</a></h3></p>
<p>Education Secretary Justine Greening said the pay gap was &quot;hard to justify&quot; - while Jeremy Corbyn said the difference was &quot;astronomical&quot;.</p>
<p>Time: 2017-07-23T16:31:41.000Z</p>
<p>Category: UK</p>
<p><h3><a href="http://www.bbc.com">What happened at Comic-Con - Day 3</a></h3></p>
<p>A round-up of the film and TV events that made headlines in San Diego.</p>
<p>Time: 2017-07-23T22:15:38.000Z</p>
<p>Category: Entertainment &amp; Arts</p>
<p><h3><a href="http://www.bbc.com">The science behind stage fright</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World News TV</a></h3></p>
<p>The latest global news, sport, weather and documentaries</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World Service Radio</a></h3></p>
<p>Stories from around the world</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Last days of the White Building</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Texas county struggles with migrant deaths</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;I was raped every day for six months&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;Camp Heat&#x27; trains young women firefighters</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Animal v Athlete: Four times man has raced beast</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The people trying to fight fake news in India</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">How formula milk helped women go back to work</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Westerners crowdfund for the breakup of Ukraine</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The city that makes the most expensive boats in the world</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">TV host&#x27;s race jokes spark Brazil-Korea online war</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The hunter-gatherer berry and porcupine diet</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Why South Korea&#x27;s women golfers are so successful</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The words that betray your personality</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">No, you don&#x27;t need a personal brand</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">Is Stonehenge giving up its secrets?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Travel</p>
<p><h3><a href="http://www.bbc.com">An untranslatable word for pure joy</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Culture</p>
<p><h3><a href="http://www.bbc.com">The chameleon the size of an ant</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Earth</p>
<p><h3><a href="http://www.bbc.com">When your memory suddenly vanishes</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">The Muslim man with a ‘forbidden&#x27; job</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">Spieth holds off Kuchar to win The Open</a></h3></p>
<p></p>
<p>Time: 2017-07-23T19:58:54.000Z</p>
<p>Category: Golf</p>
<p><h3><a href="http://www.bbc.com">Froome wins fourth Tour de France title</a></h3></p>
<p></p>
<p>Time: 2017-07-23T17:21:34.000Z</p>
<p>Category: Cycling</p>
<p><h3><a href="http://www.bbc.com">Kinghorn gold as GB finish on 39 medals</a></h3></p>
<p></p>
<p>Time: 2017-07-23T19:00:54.000Z</p>
<p>Category: Disability Sport</p>
<p><h3><a href="http://www.bbc.com">England earn controversial win over Spain</a></h3></p>
<p></p>
<p>Time: 2017-07-23T20:34:29.000Z</p>
<p>Category: Women&#x27;s Football</p>
<p><h3><a href="http://www.bbc.com">Four holes which defined Spieth&#x27;s Open win</a></h3></p>
<p></p>
<p>Time: 2017-07-23T18:47:06.000Z</p>
<p>Category: Golf</p>
<p><h3><a href="http://www.bbc.com">The moment England won the Women&#x27;s World Cup</a></h3></p>
<p></p>
<p>Time: 2017-07-23T17:03:44.000Z</p>
<p>Category: Women&#x27;s Cricket</p>
<p><h3><a href="http://www.bbc.com">Peaty breaks championship record in 100m semi-final</a></h3></p>
<p></p>
<p>Time: 2017-07-23T18:11:48.000Z</p>
<p>Category: Swimming</p>

Source code of script:

# File: bbc_main.w
# Name: BBC News live headlines
# Description: HTML output retrieves first article from www.bbcnews.com
# Input: URL [http://news.bbc.co.uk]
# Output format: HTML file
# Output fields: Source URL, Link, Title, Description

#<Logger File>
#	Global
#	FileName bbc_log.log
#	Level debug
#</Logger>

<Section>
    Name bbc_main
	
    Define $output_file bbc_output.html

	# define variable $url and assign it value
    Define $url http://www.bbc.com/news
    
	
	
    # clean output file
    <Action Print>
        FileName {$output_file}
		FileMode Write  
    </Action>
    	
	
    
    # load content
    <Action ContentURL>
        URL {$url}
        RemoveNewLine
        TagsToStrip br,nobr,b
    </Action>

	# the script will iterate through all headlines
	<Section While>
		# search for headlines only in the top part of the website
		EndAt <div class="container">
		
		# match the beginning of headline
		<Pattern>
			RegExp <div class="gs-c-promo-body
		</Pattern>
	
		<Section>
			# stop searching for date before the beginning of next headline
			EndAt <div class="gel-layout__item
	
			# match url
			<Pattern>
				RegExp <a class="gs-c-promo-heading{:re([^"]*)}" href="{$url:re([^"]*)}">
				Trim
				Compact
			</Pattern>
	
			# match title
			<Pattern>
				RegExp <h3 class="gs-c-promo-heading__title{:re([^"]*)}">{$title}</h3></a>
				Trim
				Compact
			</Pattern>
	
			# match summary
			<Pattern>
				Optional
				RegExp <p class="gs-c-promo-summary{:re([^"]*)}">{$summary}</p>
				Trim
				Compact
			</Pattern>
	
			# match time
			<Pattern>
				Optional
				RegExp <time class="gs-o-bullet__text date qa-status-date" datetime="{$time:re([^"]*)}"
				Trim
				Compact
			</Pattern>
	
			# match category
			<Pattern>
				Optional
				RegExp <span aria-hidden="true">{$category}</span>
				Trim
				Compact
			</Pattern>
	
			# and print parsed data
			<Action Print>
				FileName {$output_file}
				Text <p><h3><a href="http://www.bbc.com{$link}">{$title}</a></h3></p>\n<p>{$summary}</p>\n<p>Time: {$time}</p>\n<p>Category: {$category}</p>\n
			</Action>
        </Section>
    </Section>
</Section>

Main bbc_main