baseballbrad3

Getting duplicate articles from a single feed

Recommended Posts

Same issue here.

I just found my all duped feeds are feedburner hosted. And when duped content appeared, there will be multiple push.

e.g. http://feeds.feedburner.com/solidot this feed has issue, and the crawler history is attached, when multiple push is received at the same time, duplicated content is appeared. It seems that inoreader does not do de-dup when receive push?

2016-04-10_215038.png

Share this post


Link to post
Share on other sites

I'm also experiencing the problem with duplicate articles on http://habitica.wordpress.com/rss

Hope the diff for two subsequent feed fetch will help:

❯ diff -u rss_w*.xml
--- rss_w1.xml	2016-10-27 00:29:05.000000000 +0300
+++ rss_w2.xml	2016-10-27 17:43:15.000000000 +0300
@@ -13,7 +13,7 @@
 	<atom:link href="https://habitica.wordpress.com/feed/" rel="self" type="application/rss+xml" />
 	<link>https://habitica.wordpress.com</link>
 	<description></description>
-	<lastBuildDate>Wed, 26 Oct 2016 21:29:05 +0000</lastBuildDate>
+	<lastBuildDate>Thu, 27 Oct 2016 14:43:15 +0000</lastBuildDate>
 	<language>en</language>
 	<sy:updatePeriod>hourly</sy:updatePeriod>
 	<sy:updateFrequency>1</sy:updateFrequency>
@@ -29,7 +29,7 @@
 	<item>
 		<title>Habitica Playlist: The Habitican&#8217;s Travelogue</title>
 		<link>https://habitica.wordpress.com/2016/10/26/habitica-playlist-the-habiticans-travelogue/</link>
-		<comments>https://habitica.wordpress.com/2016/10/26/habitica-playlist-the-habiticans-travelogue/#respond</comments>
+		<comments>https://habitica.wordpress.com/2016/10/26/habitica-playlist-the-habiticans-travelogue/#comments</comments>
 		<pubDate>Wed, 26 Oct 2016 21:27:59 +0000</pubDate>
 		<dc:creator><![CDATA[S Leslie]]></dc:creator>
 				<category><![CDATA[Behind the Scenes]]></category>
@@ -65,7 +65,7 @@
 <p>&nbsp;</p>
 <p>&nbsp;</p><br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/habitica.wordpress.com/1527/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/habitica.wordpress.com/1527/" /></a> <img alt="" border="0" src="https://pixel.wp.com/b.gif?host=habitica.wordpress.com&#038;blog=93626937&#038;post=1527&#038;subd=habitica&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
 			<wfw:commentRss>https://habitica.wordpress.com/2016/10/26/habitica-playlist-the-habiticans-travelogue/feed/</wfw:commentRss>
-		<slash:comments>0</slash:comments>
+		<slash:comments>1</slash:comments>
 	
 		<media:content url="http://0.gravatar.com/avatar/370ae94da1fd47f0116c9c27e8160505?s=96&#38;d=identicon&#38;r=G" medium="image">
 			<media:title type="html">lemonesstree</media:title>

 

hab.png

Share this post


Link to post
Share on other sites
16 hours ago, Сергей Трофимов said:

I'm also experiencing the problem with duplicate articles on http://habitica.wordpress.com/rss

There is known issue with the wordpress feeds, here we discuss this behavior.

After all the problem is in the different guid elements for articles on each push and pull requests (these feeds are realtime).

The simple solution here is to subscribe to their http feed where duplicates not appearing.

Share this post


Link to post
Share on other sites
38 minutes ago, wesson said:

There is known issue with the wordpress feeds, here we discuss this behavior.

After all the problem is in the different guid elements for articles on each push and pull requests (these feeds are realtime).

The simple solution here is to subscribe to their http feed where duplicates not appearing.

Seems that's not the case. If you take a look to the diff I had attached, then you'll see that guid is not changed. It's the <comments> and <slash:comments> tags that had changed, which I believe should be insignificant in duplicate testing algorithm.

Anyway, thanks for your suggestion, but I don't really understand how to subscribe to "http feed". Could you please explain that part a little bit?

Share this post


Link to post
Share on other sites

Please note that all exampled articles have similar titles but different content. That's the reason to see them. They are updated articles. 

I can suggest you to enable the "filter similar articles" feature from Preferences -> Behavior for such cases. 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now