Checking crossfit workout booking availability with jspoon and retrofit2


There are many pros associates with new technologies. You can easily chat with friends without leaving your bed, order things online and get them brought to your place without leaving home and even get pizza and pay for it, before it gets delivered. One of those super fancy features is possibility to register for sports activities. This way you can be sure, that there would be place waiting for you, but it also gives you convenient way to change your plans and change your booking activity. But there are also days, when your favorite workout is fully booked and you’re constantly refreshing the page waiting for someone to change their plans. This might be frustrating and it has happened to me many times. That’s why I decided to write a simple application, which will scrape booking page and notify me, when there will be place available for my favorite workout.

At first I wanted to use python with simple regex solution (you really shouldn’t use regular expressions to parse HTML. It will work for simple tag collection, but it turns out that many web pages are not so regular, as regular expressions are), but then I thought:
– hey, maybe it is possible to deserialize HTML to POJO, just like you deserialize XML. And guess what? Someone already did it, and there is already ready solution for webscraping. You just need to annotate your POJO fields with proper CSS selectors and voilà, any webpage you read, can be transformed to java objects. It’s called jspoon and uses jsoup to parse HTML code. Principle of operation is similar to jackson library.

There is also another interesting library I used in my little project. It’s called retrofit and allows you to create fast and simple HTTP client for any API. It uses OkHttp as a HTTP client and has jspoon dedicated convertor, which makes it perfect solution for our problem.

So, let’s do the job. At first we need to determine the structure of page, we want to scrape. We’re good, it turns out to be simple HTML table

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<tr>
    <td class="hour">
        18:30
    </td>
 
    <td>
        <div style="width:100%;float:left">
            <div class="event" meta:id="10894364" style="color:#050505;background-color:#6feb1d;">
                <span class="eventlength">60 min</span>
                <span class="availability">
 
                    <span class="availability-number">0</span> wolnych</span>
                <p class="event_name">WOD Gymnastics</p>
                <p class="instructor">Artur</p>
                <p class="room"></p>
            </div>
        </div>
    </td>
</tr>

This is actually one workout from page containing every workout for certain day. So we need to actually only get two parameters: an hour and availibility number. But we’ll get also workout name, just to make this solution output clearer data. Our POJO code should look like following class

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
public class Workout {
    @Selector("td.hour")
    String hour;
 
    @Selector("p.event_name")
    String name;
 
    @Selector("span.availability-number")
    Integer available;
 
    public String getHour() {
        return hour;
    }
 
    public String getName() {
        return name;
    }
 
    public Integer getAvailable() {
        return available;
    }
 
    @Override
    public String toString() {
        return "Workout{" +
                "hour=" + hour +
                ", name='" + name + '\'' +
                ", available=" + available +
                '}';
    }
}

We mapped every needed property onto java POJO fields, but now we need to make a collection container, to read all workouts from page. So we do with following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public class WorkoutDay {
    @Selector("table.calendar_table_day tbody tr")
    List<Workout> workouts;
 
    public List<Workout> getWorkouts() {
        return workouts;
    }
 
    @Override
    public String toString() {
        return "WorkoutDay{" +
                "workouts=" + workouts +
                '}';
    }
}

The catch here is that we don’t point onto workout container, which in our case is table > tbody, but we must indicate CSS selector for workout field, which in our case is table > tbody > tr. Because every table row is mapped to single workout entry.

Having above classes we could easily use simple jspoon invocation and deserialize our workout entries:

1
2
3
Jspoon jspoon = Jspoon.create();
HtmlAdapter<WorkoutDay> htmlAdapter = jspoon.adapter(WorkoutDay.class);
WorkoutDay day = htmlAdapter.fromHtml(htmlContent);

But we’re getting a little creative here and we’ll use retrofit library to get the page. To do so, let’s create our API service interface with properly annotated method:

1
2
3
4
public interface WorkoutService {
    @GET("/kalendarz-zajec?view=DayByHour")
    Call<WorkoutDay> getDay(@Query("day") String date);
}

We declared HTTP GET operation and URL path for getting our workout entries. There is also dynamic query parameter day which is part of URL query and is being set from method parameter. With such interface, it is time for controller code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public class WorkoutController {
    private final Retrofit retrofit = new Retrofit.Builder()
            .baseUrl("https://cf-krakow.cms.efitness.com.pl/")
            .addConverterFactory(JspoonConverterFactory.create())
            .build();
    private final DateTimeFormatter formatter = DateTimeFormatter.ofPattern("dd-MM-uuuu");
 
    public WorkoutDay getDay(LocalDateTime date) throws IOException {
        WorkoutService service = retrofit.create(WorkoutService.class);
        Response<WorkoutDay> response =  service.getDay(date.format(formatter)).execute();
 
        if (response.isSuccessful()) {
            return response.body();
        } else {
            throw new RuntimeException("Something went wrong: " + response.errorBody());
        }
    }
}

As you can see, with builder pattern we’re setting retrofit engine with proper URL and data converter. And later, we use it to create API service and make the call. As simple as several lines of java code. Now let’s use streams to enhance Workout class functionality and return workout for specified hour

1
2
3
    public Optional<Workout> getWorkoutByHour(String hour) {
        return workouts.parallelStream().filter(w -> w.getHour().equals(hour)).findFirst();
    }

and maybe create entry point class, to read user params and invoke controller code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
public class CfNotify {
    private static final WorkoutController controller = new WorkoutController();
    private static final DateTimeFormatter formatter = DateTimeFormatter.ofPattern("HH:mm");
 
    public static void main(String... args) {
        LocalDateTime date = parseArgs(args);
 
        try {
            WorkoutDay day = controller.getDay(date);
 
            Workout workout =
                    day.getWorkoutByHour(date.format(formatter)).orElseThrow(
                            () -> new RuntimeException("No such workout"));
            System.out.println(workout);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
 
    private static LocalDateTime parseArgs(String... args) {
        if (args.length < 1) {
            throw new RuntimeException("Please execute with date parameter in following scheme 2011-12-03T10:15");
        } else {
            return LocalDateTime.from(ISO_LOCAL_DATE_TIME.parse(args[0]));
        }
    }
}

Above code altogether with previous classes will return workout for ISO date time format, which is smth like 2011-12-03T10:15 and will output similar to:

Workout{hour=16:00, name='WOD Beginners', available=2}

Of course you can change the code to output only availability and if it is bigger than 0 notify you in any choosen way. But this is outside of the scope of this post and will be described next time, when I show how to connect above mechanism to Facebook Messenger bot to set hook and get notified when again there is an possibility to register for the workout.

Whole solution can be downloaded / cloned from GitHub repo https://github.com/felix-catus/CfNotify

FB custom API: Accepting friend request on Facebook

This is third post in series of web scraping FB for creation of own API. Today I will show how to accept friend request. Previously in entry Your own Facebook API – logging in I showed how to log into your Facebook account.

Getting list of friend request is quite simple using low-end Facebook interface. You just need to go to https://mbasic.facebook.com/friends/center/requests/ and parse every “a” tag which “href” attribute starts with “/a/notifications.php?confirm=”. If you care about name of user who sent you friend request, you’d like to remember last parsed “a” tag with “href” attribute starting with “/friends/hovercard/mbasic/” and read text value from it. Of course, if your account has many friend requsts, not all of them will be available under single page. To get more of them you’d need to find and parse “a” tag which “href” attribute starts with “/friends/center/requests/”. But it is out of this post’s scope.

We need to add class field containing requests url and method which will return friend requests. Whole class should look following

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  class Facebook:
    fbUrl = "https://mbasic.facebook.com"
    loggedIn = False
    receivedFriendRequestsUrl = "https://mbasic.facebook.com/friends/center/requests/"
    headers = {"User-Agent": "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0",
               "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
               "Accept - Language": "en-US,en;q=0.5",
               "Accept - Encoding": "gzip, deflate, br",
               "Referer": "https://mbasic.facebook.com/"
    }
 
    def login(self, login, password):
        response = requests.get(self.fbUrl, headers = self.headers)
        parser = LoginParser()
        parser.feed(response.text)
        form = parser.data
        form["email"] = login
        form["pass"] = password
        response = requests.post(parser.action, form, cookies = response.cookies, headers = self.headers, allow_redirects = False)
        self.cookies = response.cookies
 
    def ensureLoggedIn(self):
        if self.cookies is None:
            raise RuntimeError("Not logged in")
 
    def getFriendRequests(self):
        self.ensureLoggedIn()
        parser = FriendConfirmParser()
        response = requests.get(self.receivedFriendRequestsUrl, cookies = self.cookies)
        parser.feed(response.text)
        return [FriendRequest(self.cookies, username, path) for username, path in parser.requests.items()]

As you can see, I created new classes: FriendRequest and FriendConfirmParser. First is an object representing friend request, containing user name and approval link. The second one is another parser, to find information we’re looking for. They should look like that

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
  class FriendRequest:
    def __init__(self, cookies, username, path):
        self.url = Facebook.fbUrl + path
        self.cookies = cookies
        self.username = username
 
    def accept(self):
        print self.url
        print self.cookies
        requests.get(self.url, headers=Facebook.headers, cookies=self.cookies)
 
  class FriendConfirmParser(HTMLParser):
    requests = {}
    insideHovercardTag = False
    lastHovercardTagValue = None
 
    def handle_starttag(self, tag, attrs):
        if tag == "a":
            attrs = {k[0]: k[1] for k in attrs}
 
            if attrs["href"].find("/a/notifications.php?confirm=") == 0:
                self.requests[self.lastHovercardTagValue] = attrs["href"]
            elif attrs["href"].find("/friends/hovercard/mbasic/") == 0:
                self.insideHovercardTag = True
 
    def handle_data(self, data):
        if self.insideHovercardTag:
            self.lastHovercardTagValue = data
 
    def handle_endtag(self, tag):
        if self.insideHovercardTag:
            self.insideHovercardTag = False

Having above code we can easily get couple first friend requests, display usernames or accept their to our friend list, like on example below.

1
2
3
4
5
6
   fb = Facebok()
   fb.login("email", "pass")
   for request in fb.getFriendRequests():
      print request.username
      if request.username == "Joe Doe":
         request.accept()

Your own Facebook API – logging in

Facebook API is a powerfull tool which provides you an interface to create games, authorize for an application or utility application, like wall content analyser, etc. But there are couple things you cannot use API to. For example, you can easily send message to your friends, but there is no way to receive messages from them. For such and other reasons I decided to web scrape FB in order to create own API. In this case I used low-end web interface available at http://mbasic.facebook.com. In this post I will describe how to create simple logging in python.

We will need to libraries for our purpose: requests  for making calls to FB and HTMLParser to scrape HTML code and extract useful information. We will also need to know how Facebook’s low-end interface work. In this case we will go to http://mbasic.facebook.com (make sure to logout first) and use web inspector tool to preview HTML code. We should see something like below

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
 <form id="login_form" action="https://mbasic.facebook.com/login.php?refsrc=https%3A%2F%2Fmbasic.facebook.com%2F&amp;lwv=100&amp;refid=8">
  <input autocomplete="off" name="lsd" type="hidden" value="AVrCfvCI" /> 
  <input name="charset_test" type="hidden" value="€,´,€,´,水,Д,Є" /> 
  <input name="version" type="hidden" value="1" /> 
  <input id="ajax" name="ajax" type="hidden" value="0" /> 
  <input id="width" name="width" type="hidden" value="0" /> 
  <input id="pxr" name="pxr" type="hidden" value="0" /> 
  <input id="gps" name="gps" type="hidden" value="0" /> 
  <input id="dimensions" name="dimensions" type="hidden" value="0" /> 
  <input name="m_ts" type="hidden" value="1466018891" /> 
  <input name="li" type="hidden" value="S6xhV6uh5PgJqdeqoQg1mGd-" /> 
  <input class="bi bj bk" name="email" type="text" value="" /> 
  <input class="bi bj bl bm" name="pass" type="password" /> 
  <input class="m n bn bo bp" name="login" type="submit" value="Log In" />
 </form>

As you can see, what we need is to find a form tag with id “login_form” and extract every input field with it’s name and value attribute. To do that we will need to create parser class which derives from HTMLParser. We should end up with someting like that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class LoginParser(HTMLParser):
    isLoginForm = False
    data = {}
 
    def handle_starttag(self, tag, attrs):
        if tag == "form":
            attrs = {k[0]: k[1] for k in attrs}
            if attrs['id'] == "login_form":
                self.isLoginForm = True
                self.action = attrs['action']
        else:
            if self.isLoginForm:
                if tag == "input":
                    name = ""
                    value = ""
                    for key, val in attrs:
                        if key == "name":
                            name = val
                        if key == "value":
                            value = val
 
                    self.data[name] = value
 
    def handle_endtag(self, tag):
        if tag == "form" and self.isLoginForm:
            self.isLoginForm = False

Now we need to connect to Facebook and retrieve login form and send login request. We can do that with following code with usage of requests library

1
2
3
4
5
6
7
        response = requests.get(self.loginUrl, headers = self.headers})
        parser = LoginParser()
        parser.feed(response.text)
        form = parser.data
        form['email'] = login
        form['pass'] = password
        response = requests.post(parser.action, form, cookies = response.cookies, headers = self.headers})

Here we send initial request to get form, then we feed parser with response and extract login form data. After that we set login and password in data dictionary and send post request to Facebook with login and form data, cookies retrieved at the beginning and headers which should simulate desktop browser. Setting headers is not mandatory, but we do not want to let Facebook know, that we are using our own script to do the job. Whole application should look like following

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import requests
from HTMLParser import HTMLParser
 
class Facebook:
    loginUrl = "https://mbasic.facebook.com/"
    headers = {"User-Agent": "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0",
               "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
               "Accept - Language": "en-US,en;q=0.5",
               "Accept - Encoding": "gzip, deflate, br",
               "Referer": "https://mbasic.facebook.com/"
    }
 
    def login(self, login, password):
        response = requests.get(self.loginUrl, headers = self.headers})
        parser = LoginParser()
        parser.feed(response.text)
        form = parser.data
        form['email'] = login
        form['pass'] = password
        response = requests.post(parser.action, form, cookies = response.cookies, headers = self.headers})
        print response.text
 
class LoginParser(HTMLParser):
    isLoginForm = False
    data = {}
 
    def handle_starttag(self, tag, attrs):
        if tag == "form":
            attrs = {k[0]: k[1] for k in attrs}
            if attrs['id'] == "login_form":
                self.isLoginForm = True
                self.action = attrs['action']
        else:
            if self.isLoginForm:
                if tag == "input":
                    name = ""
                    value = ""
                    for key, val in attrs:
                        if key == "name":
                            name = val
                        if key == "value":
                            value = val
 
                    self.data[name] = value
 
    def handle_endtag(self, tag):
        if tag == "form" and self.isLoginForm:
            self.isLoginForm = False
 
 
fb = Facebook()
fb.login("[email protected]", "secretpassword")