Save Webpages to Read Later

by Ben Ubois

Feedbin now has a read later feature. This enables you to send articles and webpages from anywhere and have them appear alongside your feeds, email newsletters and Twitter subscriptions. It’s called Pages.

There are two ways to get started:

  1. The Feedbin app. If you downloaded the app, then surprise, there’s already an action extension on your device for sending content to Feedbin. It can be enabled in the share menu by scrolling to the end of the bottom row of icons and tapping More. It’s called Send to Feedbin.
  2. For all other devices/browsers there’s a bookmarklet available in settings. Drag it to your bookmarks toolbar. Then click it whenever you’re on a page you want to send to Feedbin.

Pages works like a regular feed, so anything that gets sent to it will sync to any client you use with Feedbin.

Finally, Pages helps preserve the content you send to it. Whenever possible, Feedbin will download any images it can find in the content. That way if the images are ever moved or removed, Feedbin can automatically serve the archived images.

Feedbin for iOS

by Ben Ubois

The official Feedbin app for iPhone and iPad is now available.

This is a hybrid app, which enables it to achieve full feature-parity with the website. The app allows for better integration and performance than using Feedbin on the web.

The app also lays the groundwork for a great new feature that is coming soon.

Three Columns

by Ben Ubois

There’s an update out today that changes and improves a few things about how Feedbin works. There’s a visual refresh as well. The idea is three unbounded columns that appear to scroll indefinitely.

  1. Universal edit button. Each source now has a consistent editing interface. You can use edit for renaming, tagging, unsubscribing and deleting. Just about everything you can do to a source can now be accomplished using the edit button. The new edit interface makes everything easily accessible on mobile as well.

  2. Grouped sections. The new section headers for Searches, Tags and Feeds help differentiate the types of sources. This helps to reduce clutter as the number of sources in Feedbin increases.

  3. View mode. The control to switch between Unread, Starred, and All view modes is now a dropdown list. This change makes it very clear what you are viewing at all times.

    There’s an improvement in behavior here as well. When switching between modes, your currently selected feed will be updated to reflect the chosen mode. For example if you’re viewing a tag called Favorites under Unread and you switch to Starred, the middle column will immediately update to show just the starred articles under Favorites.

Schema Changes in PostgreSQL with Less Pain

by Ben Ubois

Feedbin has a problem. Its articles are stored in a PostgreSQL database, with a 32-bit primary key, that is about to overflow. The largest number you can store in 32 bits is 2,147,483,647 and Feedbin currently is up to id number 2,107,994,090 as of this moment. So there are only about 39 million ids left. At that point no new articles will be able to be created.

Feedbin ended up in this situation because

  1. I didn’t know any better at the time.
  2. Back when Feedbin was released, the default data-type for primary keys in Rails was 32-bit integers. Rails changed the default to 64-bit integers with version 5.1 in 2017.

Luckily, this isn’t a surprise, it’s actually something I’ve been thinking about for almost five years. In November of 2014, Feedbin hit this limit with in the table that stores the unread status of articles. That was a surprise.

However, it also wasn’t a big deal. That table did not need a primary key. The rows in that table could already be uniquely identified with a combined index of user_id + entry_id, so the fix there was to drop the column.

For the entries table, it is a big deal, and dropping this column is not an option.

There are two options:

  1. Change the column to a bigint.
  2. Switch to a different type of primary key like a UUID

Number 2 isn’t a great option. It would require many code changes, other schema changes and for clients that sync with Feedbin to be able to handle UUIDs.

There were a few different ways I identified to switch to a bigint.

  1. Change the column type directly: ALTER TABLE entries ALTER COLUMN id TYPE bigit;. This one is easy, but it would mean a lot of downtime. This takes out an ACCESS EXCLUSIVE lock on the table, meaning nothing can be written or even read while this runs. It also takes a long time.
  2. Adding a new column, backfilling the data, then switching over to the new column. I came across this solution on Stack Overflow. Until recently this is what I was planning on doing. This solution incur any downtime, but it is complicated. The process would need to be repeated for every column that refers to the entry_id as well. In my testing this took a very long time.

That brings me to option three. It turns out that I put the problem off long enough that it has been solved for me by a feature added in PostgreSQL 10 last year: logical replication. Previously, Postgres only had physical byte-for-byte replication. Logical replication offers more flexibility because it replays SQL statements like INSERT, UPDATE and DELETE on your replicas. Most importantly for this use-case is that the underlying data type of the replica does not matter as long as the data will still fit into the column.

Here’s how to do this. Let’s say there are two PostgreSQL installations on the same server. Both version 10 and 11 are installed on ports 5432 and 5433 respectively.

  1. Make sure the replica is allowed to connect to the master database

     cat >> /etc/postgresql/10/main/pg_hba.conf  <<EOL
     host all replication 123.123.123.123/32 md5
     EOL
    
  2. Set some configuration options required for physical replication. Also going to temporarily increase the min_wal_size to speed up the transfer.

     cat >> /etc/postgresql/10/main/postgresql.conf  <<EOL
     wal_level = logical
     max_wal_senders = 16
     max_replication_slots = 8
     min_wal_size = 1GB
     max_wal_size = 2GB
     EOL
    
  3. Update the configuration on the replica

     cat >> /etc/postgresql/11/main/postgresql.conf  <<EOL
     max_logical_replication_workers = 8
     max_worker_processes = 16
     max_sync_workers_per_subscription = 6
     EOL
    
  4. Restart the databases so that the new configuration takes effect.

     sudo service postgresql@10-main restart
     sudo service postgresql@11-main restart
    
  5. Dump the schema from the master, and change all integer types to bigint.

     pg_dump --port 5432 --dbname feedbin --schema-only > schema.sql
     sed --in-place 's/integer/bigint/g' schema.sql
    
  6. Create the replication role and the PUBLICATION on the master.

     psql --port 5432 --command "CREATE ROLE replication WITH LOGIN PASSWORD 'password' REPLICATION;"
     psql --port 5432 --dbname feedbin --command "CREATE PUBLICATION publication1 FOR ALL TABLES;"
     psql --port 5432 --dbname feedbin --command "GRANT SELECT ON ALL TABLES IN SCHEMA public to replication;"
    
  7. Create the database, role and import the schema on the replica.

     psql --port 5433 --command "CREATE DATABASE feedbin;"
     psql --port 5433 --command "CREATE ROLE feedbin WITH LOGIN PASSWORD 'password' SUPERUSER;"
     psql --port 5433 --dbname feedbin --file schema.sql
    
  8. Create the subscription on the replica.

     psql --port 5433 --dbname feedbin --command "CREATE SUBSCRIPTION subscription1 CONNECTION 'host=localhost dbname=feedbin user=replication port=5432 password=password' PUBLICATION publication1;"
    

Now we wait. Once the replica is caught up, there’s one more very important part.

  1. Make sure nothing is getting written to the database. Then check the lag until it reaches 0.

     psql --port 5432 --command "SELECT  application_name,  pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) lag FROM pg_stat_replication;
    

    The output should look like this when everything is caught up:

     -[ RECORD 1 ]----+-----
     application_name | subscription1
     lag              | 0
    
  2. Logical replication does not bring over sequence data like you would use for auto-incrementing primary keys. We have to copy it over as a separate step.

     sequences=$(psql --tuples-only --no-align --quiet --no-psqlrc --port 5432 --dbname feedbin --command "SELECT pg_class.relname FROM pg_class WHERE pg_class.relkind = 'S';")
     while read -r sequence; do
         count=$(psql --tuples-only --no-align --quiet --no-psqlrc --port 5432 --dbname feedbin --command "SELECT last_value FROM ${sequence};")
         echo "SELECT setval('${sequence}', ${count});" >> sequences.sql
     done <<< "${sequences}"
    

    This creates the file with sequences.sql which contains a statement for each sequence defined in the database:

     SELECT setval('entries_id_seq', 2107994090);
    
  3. Import the sequence data into the replica.

     psql --port 5433 --dbname feedbin --file sequences.sql
    

Finally update your application to point to the new database. That’s it!

The Future of Full Content

by Ben Ubois

The Mercury Parser API, made by Postlight, is shutting down.

Mercury Parser is the service that powers a number of popular features on Feedbin. These include: extracting the full content from partial content feeds, viewing the content of links in Feedbin, and displaying articles that are linked to from tweets.

However, this is actually good news, because Postlight open-sourced Mercury Parser, and it has already improved significantly. Bugs have been fixed, results have become more accurate, and in the case of Feedbin, it is now much faster.

Feedbin has been using the open-source Mercury project since mid-February, but running on different infrastructure. Postlight also open-sourced the AWS lambda API as a separate project, but I had different requirements for authentication, so I created a small web service to handle authentication and run Mercury Parser.

I was pleasantly surprised to see that running the same service on different hardware boosted performance. This isn’t even dedicated hardware like what Feedbin uses for its primary servers. Just a cluster of Digital Ocean virtual private servers.

Performance of extracting full content. Before and after moving off Lambda.

Lambda’s infinite scalability sounds very impressive, but infinite scalability is not a requirement for this use-case. Lambda would probably be cheaper (maybe free?), but I’ll always prefer performance at any cost.

Self-hosting also means that I can open up access to this service to Feedbin customers and app developers.

This service is now being used by Reeder to power its content extraction functionality and I want to offer the same arrangement to anyone that makes an app that supports Feedbin. Please get in touch if you’d like to use it in your app, whether your users are logged in to a Feedbin account or not.

For everyone else, the Feedbin entries API now includes an extracted_content_url key. Visiting this URL will return the extracted content of the entry.

Thanks to Postlight for running Mercury Parser free-of-charge all these years! If you know of a site that could use improved extracted content results, you can contribute a custom parser to the project. It’s a straightforward process if you have CSS and JavaScript experience.

P.S. Feedbin turned six today. Happy birthday! 🎂

Private by Default

by Ben Ubois

I want Feedbin to be the opposite of Big Social. I think people should have the right not to be tracked on the Internet and Feedbin can help facilitate that.

Since Feedbin is 100% funded by paying customers, I can focus solely on making the best product possible without compromises. Therefore, Feedbin can be private by default.

To me this means eliminating all potential points of leaking user data while using Feedbin.

Since Feedbin displays web content, this isn’t the easiest thing to do. Here are the leaks I’ve identified and eliminated.

iFrames

The biggest visual and functional change is how iFrames work.

Feedbin previously whitelisted a number of iFrame sources like YouTube and Vimeo so you could see embedded content. iFrames embed full web-pages from a 3rd-party source. They’re usually resource intensive to load and they enable cross-site tracking.

Feedbin now replaces all iFrames with a custom new module. The new module still includes the poster frame from videos (where available) and will fetch the title and other metadata.

Clicking on the module will swap in the original iFrame. For YouTube and Vimeo, clicking will also start playing the video.

I prefer the look of this module to the original iFrame. It loads faster, has a clearer, consistent look with richer meta-data, and uses fewer resources doing it.

Third-party JavaScript

Google Analytics is probably the number-one tracker. It’s ubiquitous on the web. For a long time it was a no-brainer to install on any website because you get a lot of functionality for free.

Feedbin used Google Analytics up until April, 2018. It was useful to see some of the stats it provided. The browser stats were good to get a sense of when it would be appropriate to drop support for older browsers. It was also useful to see referrer information to see where customers were coming from.

There are good private alternatives to Google Analytics out there. Matomo is one that I came across. They have a great privacy policy for their hosted product and you can choose to run it yourself for even more control.

I thought about replacing Google Analytics with Matomo, but I came to the same conclusion that it didn’t provide anything I need in order to run Feedbin. Better to not collect that data at all.

Twitter & Instagram embeds were another source of third-party JavaScript I identified. I would bet that the second largest contributor to tracking you across the web, comes from sites that embed social widgets. Feedbin previously used the Twitter and Instagram widgets to render embedded tweets and images that appeared in blog posts. This provided a richer experience by showing the full embed as intended by the author.

However there is an alternative. Both Twitter and Instagram offer public oEmbed endpoints. oEmbed can give you much of the data needed to properly render this content. Feedbin takes this a step further by making the oEmbed requests from the server. If your browser made the requests client-side, this would give the publishers the opportunity to read and set tracking cookies. The end result is that you see pretty much the same content as you did before.

JavaScript in blog posts is worth mentioning. RSS uses HTML for rendering content. All HTML is allowed including <script> tags. Feedbin has always used an HTML sanitizer to strip dangerous content out of posts, including scripts, since that would be the definition of an XSS vulnerability.

Images

Images are another potential source of leaking data. Feedbin has used an image proxy since launch to prevent mixed content warnings. A side benefit of the image proxy, is that your browser only makes requests to the proxy and the proxy gets the image data, preventing your request from reaching the origin.

Fonts

Feedbin has the option to use fonts from Hoefler & Co.. This requires a single request to their service, which means that they have the opportunity to track you if they wish. To eliminate this source, the default article font is now a system font. Custom fonts will only be loaded if they’re chosen.

Exceptions

Stripe is the only third-party exception I can think of. Stripe provides the invaluable functionality of billing and subscriptions. Using Stripe means Feedbin does not have to collect, store or ever see any sensitive payment data. However, since Stripe makes their money from paying customers, I think they are incentivized to be careful with this data. Their privacy policy has more details on how they store and use data.

I think with these changes in place, the only external requests that should ever be made by your browser, with the exception of Stripe, are ones initiated by you.

Favicon Fun

by Ben Ubois

Favicons are an important and often over-looked part of publishing a website. Favicons help to distinguish open tabs in web browsers and make the source of an article instantly recognizable in Feedbin.

Unfortunately, not every website publisher takes the time to create a favicon for their site. In this case, Feedbin uses a generic icon as a placeholder. The problem with this approach is that I feel it de-emphasizes feeds that do not have a custom favicon and provides no way of distinguishing the source.

To improve this situation, Feedbin now generates a unique icon color for every site that uses the default favicon. This gives equal emphasis to all sites, whether they have a favicon or not.

Colors are generated using Color Hash, with the hostname as the seed. This way the color will always be the same for the same domain.

For example:

var colorHash = new ColorHash()
colorHash.hex("feedbin.com")

Produces the color #538EAC:

If you have a website, I’d recommend adding a favicon. They’re easy to implement. Even if you already have one, it’s possible it could be improved. With the advent of retina screens, it’s important to double the size to 32×32 pixels so it does not look blurry.

Subscribe to Your Micro.blog Timeline

by Ben Ubois

Micro.blog offers open RSS/JSON feeds for all the content published on it.

The timeline feed is an aggregation of all posts from just the accounts you follow.

Feedbin now takes advantage of the extra metadata available in the timeline feed to optimize the display of Micro.blog posts.

You can follow anybody’s timeline using the timeline URL:

https://micro.blog/feeds/USERNAME.json

Currently, this is limited to just the timeline JSON feed, but I’m hoping all the other feeds will eventually get the extra metadata.

Hardware Upgrades

by Ben Ubois

I recently spent some time upgrading Feedbin’s hardware, and wanted to share the results.

Feedbin is a Ruby on Rails app, and for the most part Rails is constrained by the single-threaded performance of a CPU. So for the application servers that means favoring higher-clocked CPUs at the expense of number of cores.

The application servers use Intel Xeon E3-1270V6 CPUs. This is a four-core 3.8Ghz CPU. Each application server is configured with 16GB of DDR4 RAM. The application servers primarily run Unicorn and Sidekiq. This amount of RAM gives the processes plenty of room to spread out.

One measurement that is CPU constrained is view generation.

This chart shows the 95th percentile performance of all views generated by Feedbin. 95th percentile means that 95% of all views were generated in less time than the line in the chart. Michael Kopp wrote a good overview of how average performance can be misleading and the benefits of measuring in percentiles instead.

Looks like the new hardware gives us a nice 20%+ boost in performance.

The database servers were also upgraded. These servers are configured with dual Intel Xeon Gold 5120 CPUs. Each CPU has 14 2.2Ghz cores. With hyperthreading, that gives them an embarrassing 56 usable threads. They also have 64GB of DDR4 RAM and use Intel S3710 Series SSDs for primary storage. They each have secondary SSDs used for PostgreSQL’s write-ahead logging, so more disk I/O is available for queries.

This upgrade gives us another nice boost. If you look at the graph, you’ll notice a daily spike. These are the times when the database server is running vacuumdb. This is an important maintenance process that is used to reclaim space and optimize query planning.

Before the upgrade, vacuum caused a significant delay. After the upgrade, the performance at its worst is the same or better than pre-upgrade performance at its best.

Overall I’m happy with the upgrades. The hardware is more expensive now, but it’s important to me that Feedbin always performs well. If I can achieve that by just spending a little more money, then it makes that decision easy.

Share to Micro.blog

by Ben Ubois

Feedbin supports posting to Micro.blog directly.

If you’re not familiar with Micro.blog, here’s how its creator, Manton Reece, describes it:

A network of independent microblogs. Short posts like tweets but on your own web site that you control.

Micro.blog is a safe community for microblogs. A timeline to follow friends and discover new posts. Hosting built on open standards.

The experience of using Micro.blog is like the early days of Twitter, in all the best ways.

Micro.blog is good for blogging, because it acts as sort of gateway-drug into that habit. Say you start off just using it for Twitter-like microposts, but then you realize you have more you want to say. Micro.blog detects the length of your post and prompts you to add a title, turning that post into a full-fledged blog post.

The closest service that I can think of is App.net. However, Micro.blog is different in important ways.

  • Manton has a unique perspective and vision.
  • It is not VC backed, so there’s no pressure to maximize returns. Instead, it can focus on being the best possible product.
  • It has fewer employees. In order for Micro.blog to succeed it doesn’t need to support a whole company, it just needs to support Manton.

You can activate it on Feedbin’s Share & Save settings page. You’ll need an app token from your Micro.blog account page.

Newer
Older