ASP.NET Data Scraping

Something that has been annoying me for a while on a hobby website is the process of maintaining a points table of F1 drivers. Recently I thought there had to be a way to pull the drivers standings from the official website; there was.

Using the WebClient class, String functions and some Regex I was able to pull down the drivers standings from the official website… nice! Here is the code, very easy to do! Enjoy ;)

Default.aspx:

<%@ Page Language=”C#” AutoEventWireup=”true” CodeFile=”Default.aspx.cs” Inherits=”_Default” %>

<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>

<html xmlns=”http://www.w3.org/1999/xhtml”>

<head runat=”server”>

    <title></title>

</head>

<body>

    <form id=”form1” runat=”server”>

        <asp:Label ID=”OfficialF1DriverStandings” runat=”server” Text=”Label”></asp:Label>

    </form>

</body>

</html>

Default.aspx.cs:

using System;

using System.Net;

using System.Text.RegularExpressions;

public partial class _Default : System.Web.UI.Page

{

    protected void Page_Load(object sender, EventArgs e)

    {

        WebClient wc = new WebClient();

        string html = wc.DownloadString(“http://www.formula1.com/results/driver/2011/”);

        int startDriversPointsIndex = html.IndexOf(“<div class=”contentContainer”>”);

        int endDriversPointsIndex = html.IndexOf(“</div>”, startDriversPointsIndex);

        int lengthOfDriverPointsContent = (endDriversPointsIndex - startDriversPointsIndex);

        string officialF1DriverStandingsWithLinks = html.Substring(startDriversPointsIndex, lengthOfDriverPointsContent);

        string officialF1DriverStandings = Regex.Replace(officialF1DriverStandingsWithLinks, “(?i)<a\s+href[^>]+>|</a>”, “”, RegexOptions.IgnoreCase);

        OfficialF1DriverStandings.Text = officialF1DriverStandings;

    }

}

Result:

  1. alanfeekery posted this
blog comments powered by Disqus