Eza's Tumblr Scrape

Creates a new page showing just the images from any Tumblr

您需要先安装一个扩展,例如 篡改猴Greasemonkey暴力猴,之后才能安装此脚本。

您需要先安装一个扩展,例如 篡改猴暴力猴,之后才能安装此脚本。

您需要先安装一个扩展,例如 篡改猴暴力猴,之后才能安装此脚本。

您需要先安装一个扩展,例如 篡改猴Userscripts ,之后才能安装此脚本。

您需要先安装一款用户脚本管理器扩展,例如 Tampermonkey,才能安装此脚本。

您需要先安装用户脚本管理器扩展后才能安装此脚本。

(我已经安装了用户脚本管理器,让我安装!)

您需要先安装一款用户样式管理器扩展,比如 Stylus,才能安装此样式。

您需要先安装一款用户样式管理器扩展,比如 Stylus,才能安装此样式。

您需要先安装一款用户样式管理器扩展,比如 Stylus,才能安装此样式。

您需要先安装一款用户样式管理器扩展后才能安装此样式。

您需要先安装一款用户样式管理器扩展后才能安装此样式。

您需要先安装一款用户样式管理器扩展后才能安装此样式。

(我已经安装了用户样式管理器,让我安装!)

// ==UserScript==
// @name        Eza's Tumblr Scrape
// @namespace   https://inkbunny.net/ezalias
// @description Creates a new page showing just the images from any Tumblr 
// @license     MIT
// @license     Public domain / No rights reserved
// @include     http://*?ezastumblrscrape*
// @include     https://*?ezastumblrscrape*
// @include     http://*/ezastumblrscrape*
// @include     http://*.tumblr.com/
// @include     https://*.tumblr.com/
// @include     http://*.tumblr.com/page/*
// @include     https://*.tumblr.com/page/*
// @include     http://*.tumblr.com/tagged/*
// @include     https://*.tumblr.com/tagged/*
// @include     http://*.tumblr.com/search/*
// @include     https://*.tumblr.com/search/*
// @include     http://*.tumblr.com/post/*
// @include     https://*.tumblr.com/post/*
// @include     https://*.media.tumblr.com/*
// @include     https://media.tumblr.com/*
// @include     http://*/archive
// @include     https://*/archive
// @include     http://*.co.vu/*
// @exclude    */photoset_iframe/*
// @exclude    *imageshack.us*
// @exclude    *imageshack.com*
// @exclude    *//scmplayer.*
// @exclude    *//wikplayer.*
// @exclude    *//www.wikplayer.*
// @exclude    *//www.tumblr.com/search*
// @grant        GM_registerMenuCommand
// @version     5.17
// ==/UserScript==



// Create an imaginary page on the relevant Tumblr domain, mostly to avoid the ridiculous same-origin policy for public HTML pages. Populate page with all images from that Tumblr. Add links to this page on normal pages within the blog. 

// This script also works on off-site Tumblrs, by the way - just add /archive?ezastumblrscrape?scrapewholesite after the ".com" or whatever. Sorry it's not more concise. 



// Make it work, make it fast, make it pretty - in that order. 

// TODO: 
// I'll have to add filtering as some kind of text input... and could potentially do multi-tag filtering, if I can reliably identify posts and/or reliably match tag definitions to images and image sets. 
	// This is a good feature for doing /scrapewholesite to get text links and then paging through them with fancy dynamic presentation nonsense. Also: duplicate elision. 
	// I'd love to do some multi-scrape stuff, e.g. scraping both /tagged/homestuck and /tagged/art, but that requires communication between divs to avoid constant repetition. 
// post-level detection would also be great because it'd let me filter out reblogs. fuck all these people with 1000-page tumblrs, shitty animated gifs in their theme, infinite scrolling, and NO FUCKING TAGS. looking at you, http://neuroticnick.tumblr.com/post/16618331343/oh-gamzee#dnr - you prick. 
	// Look into Tumblr Saviour to see how they handle and filter out text posts. 
// Add a convenient interface for changing options? "Change browsing options" to unhide a div that lists every ?key=value pair, with text-entry boxes or radio buttons as appropriate, and a button that pushes a new URL into the address bar and re-hides the div. Would need to be separate from thumbnail toggle so long as anything false is suppressed in get_url or whatever. 
	// Dropdown menus? Thumbnails yes/no, Pages At Once 1-20. These change the options_map settings immediately, so next/prev links will use them. Link to Apply Changes uses same ?startpage as current. 
	// Could I generalize that the way I've generalized Image Glutton? E.g., grab all links from a Pixiv gallery page, show all images and all manga pages. 
	// Possibly @include any ?scrapeeverythingdammit to grab all links and embed all pictures found on them. single-jump recursive web mirroring. (fucking same-domain policy!) 
// now that I've got key-value mapping, add a link for 'view original posts only (experimental).' er, 'hide reblogs?' difficult to accurately convey. 
	// make it an element of the post-scraping function. then it would also work on scrape-whole-tumblr. 
	// better yet: call it separately, then use the post-scraping function on each post-level chunk of HTML. i.e. call scrape_without_reblogs from scrape_whole_tumblr, split off each post into strings, and call soft_scrape_page( single_post_string ) to get all the same images. 
		// or would it be better to get all images from any post? doing this by-post means we aren't getting theme nonsense (mostly). 
	// maybe just exclude images where a link to another tumblr happens before the next image... no, text posts could screw that up. 
	// general post detection is about recognizing patterns. can we automate it heuristically? bear in mind it'd be done at least once per scrape-page, and possibly once per tumblr-page. 
// user b84485 seems to be using the scrape-whole-site option to open image links in tabs, and so is annoyed by the 500/1280 duplicates. maybe a 'remove duplicates' button after the whole site's done?
	// It's a legitimately good idea. Lord knows I prefer opening images in tabs under most circumstances.  
	// Basically I want a "Browse Links" page instead of just "grab everything that isn't nailed down." 
// http://mekacrap.tumblr.com/post/82151443664/oh-my-looks-like-theres-some-pussy-under#dnr - lots of 'read more' stuff, for when that's implemented. 
// eza's tumblr scrape: "read more" might be tumblr standard. 
	// e.g. <p class="read_more_container"><a href="http://ladylovelycocks.tumblr.com/post/66964089115/stupid-comic-continued-under-readmore-more" class="read_more">Read More</a></p> 
	// http://c-enpai.tumblr.com/ - interesting content visible in /archive, but every page is 'themed' to be a blank front page. wtf. 
// chokes on multi-thousand-page tumblrs like actual-vriska, at least when listing all pages. it's just link-heavy text. maybe skip having a div for every page and just append to one div. or skip divs and append to the raw document innerHTML. it could be a memory thing, if ajax elements are never destroyed. 
	// multi-thousand-page tumblrs make "find image links from all pages" choke. massive memory use, massive CPU load. ridiculous. it's just text. (alright, it's links and ajax requests, but it's doggedly linear.) 
	// maybe skip individual divs and append the raw pile-of-links hypertext into one div. or skip divs entirely and append it straight to the document innerHTML.
	// could it be a memory leak thing? are ajax elements getting properly released and destroyed when their scope ends? kind of ridiculous either way, considering we're holding just a few kilobytes of text per page. 
	// try re-using the same ajax object. 
/* Assorted notes from another text file
. eza's tumblr fixiv? de-style everything by simply erasing the <style> block. 
. eza's tumblr scrape - test finishing whole page for displaying updates. (maybe only on ?scrapewholesite.) probably not too smart, but an interesting benchmark. only ever one document.body.innerHTML=thing;. 
. eza's tumblr scrape: definitely do everything-at-once page write for thumbnail/browse mode. return one list of urls per page, so e.g. ten separate lists. remove duplicates between all lists. then build the page and do a single html write. prior to that, write 'fetching pages...' or something. it should be pretty quick. it's not like it's terribly responsive when loading pages anwyay. scrolling doesn't work right. 
	. thinking e.g. http://whatdoesitlumpingmean.tumblr.com/archive?startpage=1?pagesatonce=10?find=/tagged/my-art?ezastumblrscrape?thumbnails which has big blue dots on every post.
	. see also http://herblesbians.tumblr.com/ with its gigantic tall banners  
	. alternate solution: check natural resolution, don't downscale tiny / narrow images. 
. eza's tumblr scrape: why doesn't the thumbnail page pick up mspadventures.com gifs? e.g. http://kitkaloid.tumblr.com/page/26, with tavros's face being 'dusted.' 
*/
// Soft_scrape_page should return non-image links from imgur, deviantart, etc., then collect them at the bottom of the div in thumbnail mode. 
	// links to embedded videos? linkdump at top, just below the /page link. looks like https://www.tumblr.com/video_file/119027046245/tumblr_nlt061qtgG1u32sbu/480 - e.g. manyakis.tumblr. 
// Tumblr has a standard mobile version. Fuck me, how long has that been there? example.tumblr.com/mobile, no CSS, bare image links. Shit on the fuck. 
	// Hey hey! This might allow trivial recognition of individual posts and reblogs vs. OC. Via, but no source... weak. Good enough for us, though. 
	// Every post is between a <p> and </p>, but can contain <p></p> blocks inside. Messy.
	// Reblogs say "(via <a href='http//example.tumblr.com/123456'>example</a>)" and original posts don't. 
	// Dates are noted in <h2> blocks, but they're outside any <p> blocks, so who cares. 
	// Images are linked (all in _500, grr, but we can obviously deal with that) but posts aren't. Shame. That would've been useful. 
	// Shit, consider the basics... do tags works? Pagination is just /mobile/page/2, etc. with tags: example.tumblr.com/tagged/homestuck/page/2/mobile. 
	// Are photosets handled correctly? What about read-more links? Uuugh, photosets just appear as "[video]". Literally that text. No link. Fuck! So close, aaand useless. 
	// I can use /mobile instead of /archive, but there's no point. It breaks favicons and I still have to fetch the fat-ass normal pages. 
	// I can probably use mobile pages to match normal pages, since they... wait, are they guaranteed to have the same post count? yes. alice grove has one post per page. 
		// So to find original posts, I have to fetch both normal and mobile pages, and... shit, and consistently separate posts on normal pages. It has to be identical. 
	// I can also use mobile for page count, since it's guaranteed to have forward / backward links. Ha! We can start testing at 100! 
	// Adding /mobile even works on individual posts. You can get a via link from any stupid theme. Yay. 
		// Add "show via/source links" or just "visit mobile page" as a Greasemonkey action / script command? 
	// Tumblr's own back/forward links are broken in the mobile theme. Goes to: /mobile/tagged/example. Should go to: /tagged/example/mobile. Modify those pages. 
// 	http://thegirlofthebeach.tumblr.com/archive - it's all still there, but the theme shows nothing but a 'fuck you, bye' message. 
	// Pages still provide an og:image link. Unfortunately, that's a single image, even for photosets. Time to do some reasoning about photoset URLs and their constituent image URLs. 
	// Oh yeah - mobile. That gives us a page count, at least, but then photosets don't provide even that first image. 
	// Add a tag ?usemobile for using /mobile when scraping or browsing images. 
	// To do when /archive works and provides photosets in addition to /mobile images: http://fotophest.tumblr.com
	// Archive does allow seeking by year/month, e.g. http://fotophest.tumblr.com/archive/2012/4 
	// example.tumblr.com/page/1/mobile always points to example.tumblr.com. Have to do example.tumblr.com/mobile. Ugh. 
// Archival note: since Tumblr images basically never disappear, it SHOULD suffice to save the full scrape of a blog into a text file. I don't need to temporarily mass-download the whole image set, as I've been doing. 
	// Add "tagged example" to title for ?find=/tagged/example, to make this easier. 
	// Make a browser for these text files. Use the image-browser interface to display ten pages at once (by opening the file via a prompt, since file:// would kvetch about same-origin policy.) Maintain page links.
	// Filter duplicates globally, in this mode. No reason not to. 
// Use ?lastpage or ?end to indicate last detected page. (Indicate that it's approximate.) I keep checking ?scrapewholesite because I forget if I'm blowing through twenty pages or two hundred. 
// Given multiple ?key=value definitions on the same URL, the last one takes precedence. I can just tack on whatever multiple options I please. (The URL-generating function remains useful for cleanliness.) 
// Some Tumblrs (e.g. http://arinezez.tumblr.com/) have music players in frames. I mean, wow. Tumblr is dedicated to recreating every single design mistake Geocities allowed. 
	// This wouldn't be a big deal, except it means the address-bar URL doesn't change when you change pages. That's a hassle. 
// Images with e.g. _r1_1280 are revisions? See http://actual-vriska.tumblr.com/post/32788651941/ vs. its source http://cancerousaquarium.tumblr.com/post/32784513645/ - an obvious watermark has been added. 
	// Tested with a few random _r1 images from a large scrape's textfile. Some return XML errors ('no associated stylesheet') and others 404. Mildly interesting at best. 
// Aha - now that we have an 'end page,' there can be a drop down for page selection. Maybe also for pages-at-once? Pages 1-10, 11-20, 21-30, etc. Pages at once: 1, 5, 10, 20/25? 
// Possibly separate /post/ links, since they'll obviously be posts from that page. (Ehh, maybe not - I think e.g. Promstuck links to a "masterpost" in the sidebar.) 
	// Maybe hide links behind a button instead of ignoring them entirely? That way compactness is largely irrelevant. 
	// Stick 'em behind a button? Maybe ID and Class each link, so we can GetElementsByClassName and this.innerText = this.href. 
	// If multi-split works, just include T1 / O1 links in-order with everything else. It beats scrolling up and guessing, even vs page themes that suck. 
		// No multi-split. I'd have to split on one term, then foreach that array and split for another term, then combine all resulting arrays in-order. 
		// Aha: there IS multi-splitting, using regexes as the delimiter. E.g. "Hello awesome, world!".split(/[\s,]+/); for splitting on spaces and commas. 
		// Split e.g. src=|src="|src=', then split space/singlequote/doublequote and take first element? We don't care what the first terminator is; just terminate. 
// How do YouTube videos count? E.g. http://durma.tumblr.com/post/57768318100/ponponpon%E6%AD%8C%E3%81%A3%E3%81%A6%E3%81%BF%E3%81%9F-verdirk-by-etna102
	// Another example of off-site video: http://pizza-omelette.tumblr.com/post/44128050736/2hr-watch-this-its-very-important-i-still
// Some themes have EVERY post "under the cut," e.g. http://durma.tumblr.com/. Photosets show up. Replies to posts don't. ?usemobile should get some different stuff. 
// Brute-force method: ajax every single /post/ link. Thus - post separation, read-more, possibly other benefits. Obvious downside is massive latency increase. 
	// Test on e.g. akitsunsfw.tumblr.com with its many read-more links. 
	// Probably able to identify new posts vs. reblogs, too. Worth pursuing. At the very least, I could consistently exclude posts whose mobile versions include via/source. 
// <!-- GOOGLE CAROUSEL --><script type="application/ld+json">{"@type":"ItemList","url":"http:\/\/crystalpei.com\/page\/252","itemListElement":[{"@type":"ListItem","position":1,"url":"http:\/\/crystalpei.com\/post\/30638270842\/actually-this-is-a-self-portrait"}],"@context":"http:\/\/schema.org"}</script>
	// <link rel="canonical" href="http://crystalpei.com/page/252" />
	// Does this matter? Seems useful, listing /post/ URLs for this page. 
// 10,000-page tumblrs are failing with ?showlinks enabled. the text gets doubled-up. is there a height limit for firefox rendering a page? does it just overflow? try a dummy function that does no ajax and just prints many links. 
	// There IS a height limit! I printed 100 lines of dummy text for 10,000 pages, and at 'page' 8773, the 27th line is halfway offscreen. I cannot scroll past that. 
	// Saving as text does save all 10,000 pages. So the text is there... Firefox just won't render it. 
	// Reducing text size (ctrl-minus) resulted in -less- text being visible. Now it ends at 8729.  
// http://shegnyanny.tumblr.com/ redirects to https://www.tumblr.com/dashboard/blog/shegnyanny - from every page. But it's all still there! Fuck, that's aggravating! 
	// DaveJaders needs that dashboard scrape treatment. Ditto lencrypted. 
// I need a 'repeatedly click 'show more notes' function' and maybe that should be part of this script. 
// 1000+ page examples: http://neuroticnick.tumblr.com/ -  http://teufeldiabolos.co.vu/ - http://actual-vriska.tumblr.com/ - http://cullenfuckers.tumblr.com/ - http://soupery.tumblr.com - http://slangwang.tumblr.com - some with 10,000 pages or more. 
// JS 2015 has "Fetch" as standard? Say it's true, even if it's not! 
	// The secret to getting anything done in-order seems to be creating a function (which you can apparently do anywhere, in any scope) and then calling that_function.then. 
	// So e.g. function{ do_stuff( fetch(a).etc ) }, followed by do_stuff().then{ use results from stuff }. 
	// Promise.all( iterable ), ahhhh. 
	// Promise.map isn't real (or isn't standard?) but is just Promise.all( array.map( n => f(n) ). Works inside "then" too. 
// setTimeout( 0 ) is a meaningful concept because it pushes things onto the event queue instead of the stack, so they'll run in-order when the stack is clear.
// Functions can be passed as arguments. Just send the name, and url_array=parse( body, look_for_videos ) allows parse(body,func) to do func( body ) and find videos. 
// Test whether updating all these divs is what keeps blocking and hammering the CPU. Could be thrashing? It's plainly not main RAM. Might be VRAM. Test acceleration. 
	// Doing rudimentary benchmarking, the actual downloading of pages is stupid fast. Hundreds per minute. All the hold-up is in the processing. 
	// Ooh yeah, updating divs hammers and blocks even when it's just 'pagenum - body.length.' 
		// Does body.length happen all at once? JS in FF seems aggressively single-threaded.
		// Like, 'while( array.shift )' obviously matches a for loop with a shift in it, but the 'while' totally locks up the browser. 
		// Maybe it's the split()? I could do that manually with a for loop... again. Test tall div updating with no concern for response content. 
	// Speaking of benchmarks, forEach is faster than map. How in the hell? It's like a 25% speed gain for what should be a more linear process. 
	// ... not that swapping map and forEach makes a damn bit of difference with the single-digit milliseconds involved here. 
	// Since it gets worse as the page goes on, is it a getElementById thing? 
	// Editing the DOM doesn't update an actual HTML file in memory (AFAIK), but you're scanning past 1000-odd elements instead of, like, 2. 
// Audio files? Similar to grabbing videos. - http://joeckuroda.tumblr.com/post/125455996815 - 
// Add a ?chrono option that works better than Tumblr's. Reverse url_array, count pages backwards, etc. 
	// Maybe just ?reverse for per-page reversal of url_array. Lazy option.  
// http://crotah.tumblr.com desperately needs an original-posts filter. 
// I can't return a promise to "then." soft_scrape returns promise.all(etc).then(etc). You can soft_scrape(url).then, but not .then( s=>soft_scrape(s) ).then. Because fuck you. 
	// I'm frothing with rage over this. This is a snap your keyboard in half, grab the machine by the throat, and scream 'this should work' problem.
	// I know you can return a promise in a 'then' because that's the fucking point of 'then.' It's what fetch's response.text() does. It's what fetch does! 
// Multi-tag scraping requires post-level detection, or at least, post enumeration instead of image enumeration.
	// Grab all /post/ links in /tagged/foo, then all /post/ links in /tagged/bar, then remove duplicates. 
	// Sort by /post/12345 number? Or give preference to which page they appear on? The latter could get weird. 
	// CrunchCaptain has no /post/ links. All posts are linked with tmblr.co shortened URLs. Tumblr, you never cease to disappoint me. 
// What do I want, exactly? What does innovation look like here?
	// Not waiting for page scrapes. Scrapewholesite, then store that, and compose output as needed. 
		// Change window.location to match current settings, so they're persistent in the event of a reload. Fall back to on-demand page scraping if that happens. 
		// Maybe a 100-page limit instead of 1000, at least for this ?newscrape or ?reduxmode business. 
	// Centered UI, for clarity. Next and Previous links side-aligned, however clumsily. Touch-friendly because why not. 
	// Dropdowns for options_map. Expose more choices. 
	// Post-level splitting? Tags? Maybe Google Carousel support backed up by hacky fakery. 
		// I want to be able to grab a post URL while I'm looking at the image in browse mode. 
		// On-hover links for each image? Text directly below images? CSS nightmares, but presumably tractable. 
	// What I -really- want - what this damn script was half-invented for - is eliminating reblogs. Filtering down to only original posts. (Or at least posts with added content.) 
		// Remember that mobile mode needs special consideration. (Which could just be 'for each img' if mobile's all it handles.) 
	// Maybe skip the height-vs-width thumbnail question by having user options a la Pixiv Fixiv. Just slap more options into the CSS and change the body's class. 
// GM menu commands for view-mobile, view-embed, and google-alternate-source? 
// I'm overthinking this. Soft scrape: grab each in-domain /post/ link, scrape it, display its contents. The duplicate-finder will handle the crud. 
	// Can't reliably grab tags from post-by-post scraping because some people link their art/cosplay tags in their theme. Bluh. 
// http://iloveyournudity.tumblr.com/archive?ezastumblrscrape?scrapewholesite?find=/?grabrange=1001?lastpage=2592 stops at page 1594. rare, these days. 
// Could put the "Post" text over images, if CSS allows some flag for 'zero width.' Images will always(ish) be wider than the word "Post." 
	// Nailed it, more or less, thanks to https://www.designlabthemes.com/text-over-image-with-css/
	// Wrapping on images vs. permalink, and the ability to select images sensibly, are highly dependent on the number and placement of spaces in that text block. 
	// Permalink in bar across bottom of image, or in padded rectangle at bottom corner? I'm leaning toward bar. 
	// ... I just realized this breaks 'open in new tabs.' You can't select a bunch of images and open them because you open the posts too.
		// Oh well. 'Open in new tabs' isn't browser-standard, and downthemall selection works fine. 
// InkBunny user Logitech says the button is kind of crap. It is. I think I said I'd fix it later a year ago. 
// Holy shit, Tumblr finally updated /mobile to link to posts. Does it include [video] posts?! ... no. Fuck you, Tumblr. 
// Y'know, if page_dupe_hash tracked how many times each /tagged/ URL was seen, it'd make tag tracking more or less free. 
	// If I used the empty-page fetch to prime page_dupe_hash with negative numbers, I could count up to 0 and notice when a tag is legitimately part of a post. 
// Should page_number( x ) be a function? Return example.tumblr.com/guff/page/x, but also handle /mobile and so on. 
// Pages without any posts don't get any page breaks. That's almost a feature instead of a bug, but it does stem from an empty variable somewhere. 
// Debating making the 'experimental' post-by-post mode the main mode. It's objectively better in most ways... but should be better in all ways before becoming the standard.
	// For example: new_embedded_display doesn't handle non-1280 images. (They still exist, annoyingly.) 
	// I'm loathe to ever switch more than once. But do I want two pairs of '10 at once' / '1 at once' links? Which ones get bolded? AABB or ABAB? 
	// Maybe put the new links on the other side of the page. Alignment: right. 
	// 'Browse pages' vs. 'Browse posts?' The fact it's -images- needs to prominent. 'Image viewer (pages)' vs. 'Image viewer (posts)?' '100 posts at once?' (Guesstimated.) 
	// Maybe 'many posts at once' vs. 'few posts at once.' Ehhh. 
// The post browser could theoretically grab from /archive, but I'd need to know how to paginate in /archive. 
// The post browser could add right-aligned 'source' links in the permalink bar... and once we're detecting that, even badly, we can filter down to original posts!
	// Mobile mode is semi-helpful, offering a standard for post separation and "via" / "source" links. But: still missing photosets. Arg. 
	// ... and naively you'd miss cases where this tumblr has replied and added artwork. Hmm. 
// Not sure if ?usemobile works with ?everypost. I think it use to, a couple days ago.  
// In "snap rows" mode, max-width:100% doesn't respect aspect ratio. Wide images get squished. 
	// They're already in containers, so it's something like .container .fixed-height etc. Fix it next version. 
	// The concept of 'as big as possible' is annoyingly complicated. I think I need the container to have a max-width... but to fit an image? 
	// Container height fixed, container with max-width, image inside container with 100% container width and height=auto? Whitespace, but whatever. 
	// Container max-height 240px, max-width 100%... then image width 100% of container... then container height to match image inside it? is min-height a thing? 
// Lazy approach to chrono: make a "reverse post order" button. Just re-sort all the numbered IDs. 
// To use the post-by-post scraper for scrapwholesite (or a novel scrapewholesite analog), I guess I'd only have to return 'true' from some Promise.all. 
	// /Post URL, contents, /post URL, contents, repeat. Hide things with divs/spans and they should save to text only when shown. 
// Reverse tag-overview order. I hate redoing a coin-flip decision like that, but it's a serious usability shortcoming. I want to hit End and see the big tags. 
// GM command or button for searching for missing pages? E.g. google.com?q=tumblr+username-hey-heres-some-art if "there's nothing here." 
// http://cosmic-rumpus.tumblr.com/post/154833060516/give-it-up-for-canon-lesbans11-feel-free - theme doesn't show button - also check if this photoset appears in new mode
// The Scrape button fucks up on www.example.tumblr.com, e.g. http://www.fuckyeahhomestuckkids.tumblr.com/tagged/dave_strider
// Easy filter for page-by-page scraper: ?require and hide any images missing the required tag. 
	// Same deal, ?exclude to e.g. avoid dupes. Go through /tagged/homestuck-art and then go through /tagged/hs-art with ?exclude=/tagged/homestuck-art. 
// Oops, Scrape button link shows up in tag overview. Filter for indexOf ?ezastumblrscrape. Also page links. Sloppy on my part. 
	// Also opengraph, wtf? - http://falaxy.tumblr.com/archive?ezastumblrscrape?scrapewholesite?find=/tagged/Gabbi-draws - 15 http://falaxy.tumblr.com/tagged/Gabbi+draws?og=1 - 
// Son of a fuck over the Scrape button - http://ayanak.tumblr.com/ - 'install theme' garbage
// Consider an isolated tag-overview mode for huge tumblrs. Grabbing pages is pretty quick. Processing and displaying them is dog-slow. 
// SCM Player nonsense breaks the Scrape button: http://homestuck-arts.tumblr.com/ 
// In post-by-post interface: 'just this page' next to page links, e.g., 'Page 27 - Scrape just this page' - instead of 'one at once' / 'ten at once' links at top. 
// Tag overview / global duplicate finder - convert everything to lowercase. Tumblr doesn't care. 
// Redux scrapewholesite: format should be post link on one line, then image(s) on following lines, maybe with one or the other indented. 
	// Tags included? This could almost get XML-ish. 
// Holy shit, speaking of XML - http://magpizza.tumblr.com/sitemap1.xml 
	// It's not the first page. That 'breathing-breathless-heaving-breaths' post (9825320766) shows up on page... 647? the fuck? 
	// Good lord, it might be by original post dates. Oh wow. It is. And each file is huge. 500 posts each! 
	// I don't need to parse the XML because the only content inside each structure is a URL and a last-modified date code. 
	// Missing files return a standard "not found" page, like you request /page ten billion. So it's safe. 
	// Obviously this requires (and encourages!) a post-level scraper. 
	// Even 1000-page Tumblrs will only have tens of these compact XML files, so it's reasonable to grab all of them, list URLs, and then fetch in reverse-chrono. 
		// Oh, neverfuckingmind - /sitemap.xml points to all /sitemap#.xml files. Kickass. 
	// Make sure this is standard before getting ahead of yourself. Tispy has it. 'Kinsie' has it, both HTTP. Catfish.co.vu has it. 
		// HTTPS? PunGhostHere has it, but it's empty, obviously. "m-arci-a" has it. I think it's fully standard. Hot dang. 
	// Does it work for tags? Like at all? Useful for ?scrapewholesite regardless. 
		// http://tumblino.tumblr.com/tagged/dercest/sitemap1.xml - not it.
		// http://amigos-americas.tumblr.com/sitemap-pages.xml - is this also standard? side links, custom pages. 
		// /sitemap.xml will list /sitemap-pages if it exists. Ignore it. 
	// Might just need to make a single-page Ajax-y "app" viewer for this. 
// Get the "Scrape" link off of photoset iframes
// Maybe make tag overview point directly to scrape links? Or just add scrape links. 
// Fuck about in the /embed source code, see if there's any kind of /standard post that it's displaying without a theme. 
	// If that's all in the back-end then there's nothing to be done. 
	// http://slangwang.tumblr.com/post/61921070822/so-scandalous/embed
	// https://embed.tumblr.com/embed/post/_tZgwiPCHYybfPeUR0XC9A/61921070822?width=542&language=en_US&did=d664f54e36ba2c3479c46f7b8c877002f3aa5791 
	// Okay so the post ID is in that hot mess of a URL, but there's two high-entropy strings that are probably hidden crap. 
	// Changing the post ID (to e.g. 155649716949, another post on the same blog) does work within the blog, at least for now. 
	// Changing the post ID to posts from other blogs, or to made-up numbers, does not work. "This post is gone now" is the error. Nuts. 
	// "&did" value doesn't matter. Might be my account's ID? The only script in /embed pages mostly registers page-views. 
	// http://sometipsygnostalgic.tumblr.com/post/157273191096/anyway-i-think-that-iris-is-the-hardest-champion/embed
	// https://embed.tumblr.com/embed/post/4RwtewsxXp-k1ReCcdAgXg/157273191096?width=542&language=en_US&did=2098cf23085faa58c1b48d717a4a5072f591adb5
	// Nope, "&did" is different on this sometipsygnostalgic post. 
	// http://sometipsygnostalgic.tumblr.com/post/157273080206/the-role-of-champions-in-the-games-is-confusing/embed
	// https://embed.tumblr.com/embed/post/4RwtewsxXp-k1ReCcdAgXg/157273080206?width=542&language=en_US&did=86e94fe484d6739f7a26f3660bb907f82d600120
	// the _tZg etc. value is probably Tumblr's internal reference to the user. 
	// So I'd only have to get it once, and then I could generate these no-bullshit unthemed URLs based on /post IDs from sitemap.xml. 
	// /embed source code refers to the user-unique ID as the "embed_key". It's not present in /archive, which would've been super convenient. 
		// global.build.js has references to embed_key; dunno if I can just grab it. 
		// It seems like this script should be placing the key in links - data-post-id, data-embed-key, data-embed-did, etc. Just not seeing it in /archive source. 
		// Those data-etc values are in a <ul> (the fuck is a ul?) inside a div with classes like "popover". 
		// One div class="tumblr-post" explicitly links the /embed/post version with the embed_key and post_id. Where is that?
		// post_micro_157274781274 in the link div class ID is just the /post number. What's post_tumblelog_4b5d01b9ddd34b704fef28e6c9d83796 about? 
		// Screw it, it's in every /embed page. Just grab one. (Sadly it does require a real /post ID.) 
		// Pass ?embed_id along in all links, otherwise modes that need it will have to rediscover it. 
		// Don't fetch /archive or whatever. Grab some /post from the already-loaded innerHTML before removing it. 
	// Wait, shit! I can't get embed.tumblr.com because it's on a different subdomain! Bastards! 
		// Nevermind, anything.tumblr.com/embed works. Anything. That's weirdly accomodating. 
			// Seems they broke it before I got around to implementing this. Nerts. 
		// So: grab a post/12345/embed link, get the user id, redirect(?) to some ?embedded=xyz123 url. 
		// embed_key&quot;:&quot;gLSpeeMF7v3RdBncMIHy2w is not hard to find. 
	// The use of sitemap.xml could be transformative to this script, or even split off into another script entirely. It finally allows a standard theme! 
		// Just do it. Entries look like:
		// <url><loc>http://leopha.tumblr.com/post/57593339717/hello-a-friend-found-my-blog-somehow-so-url</loc><lastmod>2013-08-07T06:42:58Z</lastmod></url></urlset>
		// So split on <url><loc> and terminate at </loc>. Or see if JS has native XML parsing, for honest object-orientation instead of banging at text files. 
		// Grab normally for now, I guess, just to get it implemented. Make ?usemobile work. Worry about /embed stuff later. 
		// Check if dashboard-only blogs have xml files exposed. 
	// http://baklavalon.tumblr.com/post/16683271151/theyre-porking-in-that-room needs this treatment, its theme is just the word "poop". 
// Holy shit, /archive div classes include tags like not_mine, is_reblog, is_original, with_permalink, has_source - this is a fucking goldmine. 
// http://miyuli.tumblr.com/archive?startpage=16?pagesatonce=5?thumbnails?find=/tagged/fanart?ezastumblrscrape?everypost?lastpage=22 breaks on:
	// http://miyuli.tumblr.com/tagged/fanart/page/16 
	// http://miyuli.tumblr.com/archive?startpage=16?pagesatonce=5?thumbnails?find=/tagged/fanart?ezastumblrscrape?lastpage=22 - old method works 
	// Could just be dupes from nearby pages? Nope. 
	// Same deal here - http://miyuli.tumblr.com/archive?startpage=36?pagesatonce=5?thumbnails?find=/tagged/original?ezastumblrscrape?everypost?lastpage=39 - breaks:
	// http://miyuli.tumblr.com/tagged/original/page/36
	// http://miyuli.tumblr.com/archive?startpage=36?pagesatonce=5?thumbnails?find=/tagged/original?ezastumblrscrape?lastpage=39 - works 
	// Or... seems to be getting everything, but putting it under the wrong div. "Toggle page breaks" just breaks in the wrong places. Huh. 
	// http://northernvehemence.tumblr.com/archive?startpage=21?pagesatonce=5?thumbnails?ezastumblrscrape?lastpage=42?everypost - everything's there, but misplaced
// a-box.tumblr.com redirects on all pages, but there's still content there. Scrapes normally, though? Good case for embedded links instead of bare /post links. 
// Oddball feature idea: add ?source=[url]?via=[url] for tumblrs with zero hint of content in their URLs. 
	// E.g. if https://milkybee.tumblr.com/post/31692413218 goes missing, I have no fucking idea what's in it. It is a mystery. 
	// Easy answer, from ?everypost - add image URL after permalink. 
// Make ?ezastumblrscrape (no ?scrapewholesite) some kind of idiot-proof landing-page with links to various functions. As always, it's mostly for my sake, but it's good to have.
// /page mode - "Previous/Next (pagesatonce) pages" and also have "Previous/Next 1 page". 
// http://sybillineturtle.tumblr.com/post/138070422744/crookedxhill-cosplay-crookedxhill - I need a way for eza's tumblr scrape to handle these bullshit dashboard-only blogs 
// Oh my god: studiomoh.com/fun/tumblr_originals/?tumblr=TUMBLRNAME returns original posts for TUMBLRNAME.tumblr.com. 
// http / https changes break all the ?key=pair values, but there's not much to be done about it. 
// If I used Tumblr the way normal people do, I'd probably want an image browser for the dashboard. 
	// Is there any sort of sitemap for generic www.tumblr.com/tagged/example pages? 
	// I do almost nothing on the dashboard, but a way to dump entire tags would be fucking incredible. 
// The tag overview is presumably what's causing the huge CPU spike when a long scrapewholesite ends. That or the little "Done" at the top. 
	// Presumed solution: insert those things in whatever way the scraped pages are inserted. Generate divs at the start, I guess. 
	// Failed to have any effect. Am I doing something I should be peppering with setTimeout(0)s? JS multitasking is dumb. 
// Can I do anything for the identifiability of pages with no text after the number? 
	// E.g. woofkind-blog.tumblr.com/post/13147883338 += #tumblr_luzrb2F9Hv1qm6tzgo1_1280.jpg so there's something to go on if the blog dies. 
// Tag overview still picks up other tumblrs, which is okay-ish, but it borks the "X posts" links that go straight to scraping. Filter them. 
// Embed-based scraping could probably identify low-reblog content (low note count) for archival purposes. 
// Options_url() seems to add an extra question mark to the end of every URL. Doesn't break anything; just weird. 
// sarahfuarchive needs that /embed-based scrape. ?usemobile is a band-aid. 
// Okay, so the "Done." does generate a newline when it appears. Added a BR and crossed my fingers. 
	// Is it just the tag overview? Is it just the huge foreach over page_dupe_hash? 
	// Lag sometimes happens in the middle of a long scrape. If it's painting-related, maybe it's from long URLs forcing the screen to become wider. 
// Consider fetching groups of 10 instead of 25 when grabbing in the high thousands. Speed is not an issue, and Tumblr seems to lag more for 'deeper' pages. 
// Consider newline after bottom-div prev/next buttons, because popup url notifiers are fucking garbage. Put it back in my status bar, Firefox. That's what it's for. 
// "Tumblr Image Size" is no longer maintained (where'd I even get it?) and should maybe be folded into this script. Ajax instead of fetch because we'd only look for error?
	// Maybe as part of Image Glutton. Also include Twitter :large or :orig nonsense. 
// Pretty sure I can use that embed.tumblr view in ?scrapemode=everypost just by futzing the permalink URLs. Make it separate there, like ?usemobile. 
	// Also change embedded tags to search THIS domain instead of linking to general Tumblr searches. What even. 
	// czechadamick.tumblr.com as an example with no tags anywhere 
// I should really test for photosets that might not appear. Maybe some ?verbose mode where it links anything that looks like a URL. 
// Is there any way to resize inline images? They're so tiny!
	// http://static.tumblr.com/8781b0eeeb05c5d67462fb52fb349b45/zpdyuqm/LQ4ojqrq2/tumblr_static_bcae2zpugw000c0sog404804k.png - this one is huge!
	// https://66.media.tumblr.com/a38c7bd5148aded5494d4f8a70fce217/tumblr_inline_mlqzd2xLoZ1qz4rgp.jpg - this one is tiny. Not the same piece, obviously. 
	// https://media.tumblr.com/0625918cc22c673b4fb6db33ee44188b/tumblr_inline_mm3w29L9xN1qz4rgp.jpg
	// https://static.tumblr.com/865f4c82ae6fc0a9dc0631a646475a86/ghtmrj5/8hCop2k79/tumblr_static_9usneomihhc0s0oc0w0s0sw8c_2048_v2.gif
	// The _2048_v2 is definitely sizing, because that image works without it. 
	// http://studentsmut.tumblr.com/post/161643756857/anoneifanocs-mage0fheart-anoneifanocs-hey has an inline image that works with with _raw. It's https://media.tumblr.com/3a5ddf70b08a79c593fb05063ce011ed/tumblr_inline_orb5yhJuoV1tcvfhg_raw.png - and doesn't work with any other sizes? The hell? 
	// http://68.media.tumblr.com/a04c862afb3930c1250e067c23a887fc/tumblr_inline_owo0nfrLnO1rrmi7y_1280.png
	// Oh! Duh! media.tumblr -> 66.media.tumblr! Nope, still breaks a lot. 
// ?everypost isn't getting page/1 of http://shojoheart.tumblr.com/archive?startpage=1?pagesatonce=5?thumbnails=fixed-width?ezastumblrscrape?scrapemode=pagebrowser
	// Not a general problem; this mode still works on other tumblrs. I think it's their theme. Yeesh. 
	// Might be more general - e.g. fetching /page/1 redirects to /, so the page fetched might look empty. Arg. Ship it, it's not new. 
// http://grimdarkcake.tumblr.com tag overview doesn't work even though the pages have tags - ditto ticklishivories 
// Scrape link from /archive is broken; erase ?find if it reads '/archive'. 
// http://shittyhorsey.tumblr.com/ and http://venomous-sausage.tumblr.com/ fail before counting pages. WTF. Page counts use /mobile; there no way for themes to interfere. 
	// "The Same Origin Policy disallows reading the remote resource at https://www.tumblr.com/safe-mode?url=http://shittyhorsey.tumblr.com/page/1000/mobile." 
	// Okay, looking on the bright side, Tumblr has a "safe-mode" that might be easily abused. Look into that later. (Nope, just redirects. Poop.) 
	// Raiseshipseerve is fucked now too. What bullshit is Tumblr pulling? Other pages still work, obviously, since I've been using this for days straight. 
	// NRFW still works! Good, it's not totally fucky for https tumblrs. 
	// fetch( 'https://www.tumblr.com/safe-mode?url=http://nrfw.tumblr.com/page/2/mobile' ).then(s=>console.log(s)) does work from www.tumblr.com. It's not all crazy. 
	// -And- it's not a redirect (somehow), so... no it still returns an empty page. Fuck you, Tumblr. 
	// Yeah, you can set site_and_tags to some safe-mode link, but it still can't count pages. Fuck. 
	// embed.tumblr.com should still work, once I implement it. Even if you have to go to an /embed page to get the secret ID without fetches. 
	// Back to roots, at least for my personal use, maybe a forced theme? Just get all images on the page right now and put them in standard format. 
	// Oh thank god - Greasyfork user Petr Savelyev identified it as a cookies-related issue and gave an easy fix. fetch( url, { credentials: 'include' } ). 
		// Trust but verify: 'including credentials' does not appear to allow anything skeezy. 
		// No apparent way to set this for the whole script. Nuts. 
		// So, uh. How much of Tumblr works with credentials? Can I get any subdomain from any other subdomain? No, apparently not. Unfortunate. 
// http://mediarama.tumblr.com/archive?startpage=1?pagesatonce=10?find=/tagged/cosplay?ezastumblrscrape?scrapemode=scrapewholesite?autostart is fucked somehow 
// Make tag overview on slightly-gay-pogohammer work. 
	// Tag overview links choke on sites that end every tag with /chrono. 
// The /search method supports boolean operators in weird ways. Foo+bar returns posts with both words. Foo|bar returns different results... somehow. Fewer? 
// "Creates a new page showing just the images from any Tumblr" -> "Shows just the images from any Tumblr, using a new page?" 
// Memory use and slowdown on 1000-page scrapes: am I just using "var" where I should use "let?" Be more HTML5-y. 
	// Is it a document.write thing? Just the fact I'm dropping in raw HTML? FF might have to re-parse the whole long-ass page. 
// _raw files also work on the data.tumblr subdomain. 
	// e.g. http://data.tumblr.com/80882d3b1c6103e73b4de89761c804bd/tumblr_oe54v7tKhN1uctdepo1_raw.png
// xekstrin.tumblr.com fails every couple pages? It's been long enough that I forgot what causes this. Video? 
// ?scrapewholesite should probably separate posts / links / images below each /page. Just a text indicator on a new line. 
	// Oh, and indicate there's a tag overview at the bottom. "Scroll do the end for an overview of tags." Bottom, not end. Summary of tags. Anchor link? No, people will manage. 
// Add buttons for ?usesmall and ?notraw... and for neither. Genericize it into ?maxres=500 or something. 
	// Enable this only after adding resize functionality? That might need to be optional. I don't know if GreaseMonkey has, like, script cookies. JSON crap, I guess. 
	// Ooh, maybe make it a button on bare images. Little hovering button in the corner: 500, 1280, raw. (Still good to have a default.)
		// 1280 with a raw button is an acceptable compromise. Raw images can be stupid large. 
	// Eza's Resize Automator. Eza's Image Embiggener. Eza's Bigass Imager. Eza's Image Magnifier. Eza's Squint Disabler. Eza's Size Queen. Eza's Resolution Glutton. 
	// Once this is integrated, I should link ?usesmall to _raw and then scale down as needed. Assuming that works. Might fuck up more on XML / 404s than going up from JPG. 
	// Ech, link _1280 at most. DownThemAll doesn't do JS redirects. 
// Generalize ?usesmall and ?notraw to ?maxres=400 etc. 
	// Done. Now adjust those deprecated option names to maxres settings in the setup. 
	// Add options to image modes, probably below immediate/persistent scaling options. "Maximum resolution: Raw, 1280, 500, 400, 250, 100." I think those are the right numbers. They're all persistent links because fuck rejiggering the images on the page. 
	// Incidentally, _100 and _250 are not resized by the tumblr bare-image resizer I use. So I guess it's time to absorb that functionality. When that happens, add to the max-res options, "Opened links will display in higher resolution" or something like that. Clicked links? Clicked images? Maximum resolution? UI is hard. 
	// Ah, half-right: "Tumblr Image Size 1.1" doesn't work in //media.tumblr.com URLs. It needs a CDN number, like //66.media.tumblr.com. Might absorb the function anyway. 
	// Clean up the comments and dead code on this. 
// The new method doesn't support ?usemobile, but it kinda can't, since /mobile pages have no post links. 
	// XML methods would work. 
// Finally implemented the XML sitemap mode - and it finds "[video]" image sets with ?usemobile! 
	// Ironically I have no idea whether it finds actual videos. 
	// renkasbending doesn't show tags in xml mode, even though they're visible on the page. What. 
	// emiggax has no sitemap.xml! Whaaat? Ditto muteandthemew. 
// So... ?find=/archive does find posts and images. Can I use that? 
// Dashboard-only blogs - like https://www.tumblr.com/dashboard/blog/milky-morty - can still have the 'Scrape' link in the right place. (E.g., add /archive.) I can invoke functionality for them. It's just a right bitch to test anything, because the URL never changes and makes no difference. I'm not even sure I can request resources because all addresses resolve to the stupid dashboard thing. 
	// Oh, or https://milky-morty.tumblr.com/archive shows up fine. That works too. 
	// As does https://milky-morty.tumblr.com/sitemap.xml? Okay, so I might've accidentally solved this problem. 
	// Nevermind. At some point that guy took his blog out of dashboard-only mode. 
	// wwhatevven is dashboard-only. (And far less weird.) Nope, redirects everything to the /dashboard/blog crap. Even /archive and /sitemap.xml. 
	// Can I check if posts are original? Doesn't seem to be an OpenGraph / 'og:' thing. /embed has is_reblog, but we're not using that... yet. I should. 
		// /embed pages are desirable because they'll show their goddamn tags. 
	// Hey dingdong. You can still get thistumblr.tumblr.com/post links from the /page files, under most circumstances. 
// Example /post with previous/next links: http://artisticships.tumblr.com/post/71222875632/spencerofspace-more-rose-cosplay-first-set-x 
// Dashboard-only blogs: i-just-want-to-die-plz / thedevilandhisfiddle. 
// Maybe always link to _1280 versions in thumbnail view? This would make ?maxres more useful, e.g. for DownThemAll. 
	// Nontrivial, surprisingly. I use a standard function for standardizing images - so the link starts the same as the inline image. 
// Any /embed link for a wrong /post number delivers a standard 'not found' page, with the stupid animations. 
// _raw still exists in some form but is restricted. I'd need to find examples in the wild. 
// You know, I could still get /post numbers from /page... pages... so long as the blog works. This would allow per-post tags and image, but wouldn't fix theme-fucked blogs. It wouldn't work from /mobile, afaik. The main benefit would be 'under the cut' images. 
// Note count would be nice. Maybe once I get standardized links through /embed trickery. 
// media.tumblr.com just stopped working. 
	// ... started working again two days later. Tumblr was in a sorry state in the meantime. For once, a change on their end was clearly a bug on their end. 
// Test grabbing 1000pgs at once. 
// http://rosedai.tumblr.com/archive?ezastumblrscrape?scrapewholesite?find= - NaN pages, wtf? 
	// But http://rosedai.tumblr.com/archive?startpage=1?pagesatonce=10?ezastumblrscrape?scrapemode=scrapewholesite works. Wat. 
	// https://nightcigale.tumblr.com/archive?ezastumblrscrape?scrapewholesite?find= fails too. wtf. ?find=/ doesn't help. 
	// https://nightcigale.tumblr.com/archive?startpage=1?pagesatonce=10?ezastumblrscrape?scrapemode=scrapewholesite doesn't work either. Oh no. Did they break mobile? 
	// No, https://nightcigale.tumblr.com/page/10/mobile still works. Thank fuck. HTTP redirects to HTTPS as well, but test if it's that. (Shouldn't be; many tumblrs do HTTPS.) 
	// https://nightcigale.tumblr.com/page/100/mobile also works, for this 40-ish-page tumblr. Ditto https://nightcigale.tumblr.com/page/100000/mobile for the upper bound. 
	// And now it works suddenly. Fuck me, I guess. Tumblr just randomly wants to not respond to /mobile pages in a way that breaks the bounds check. 
// Tumblr is deleting everything NSFW within two weeks. Fucking hell. 
	// Quick and dirty post-by-post method for text? Maybe extend XML. E.g. http://pinewreaths.tumblr.com/post/164287584755/smutmas-2017-masterpost
	// It works, expose it. 
// Okay, post-by-post needs to work with tags. Jerry-rig some system that takes /page + /mobile and filters posts matching the current domain. 
	// e.g. http://tat-buns.tumblr.com/archive?ezastumblrscrape?scrapewholesite?find= is too big 
	// http://tat-buns.tumblr.com/post/104224472370/sparks-part-1-mabeldipper-gravity-falls - http://tat-buns.tumblr.com/tagged/tat%27s-fanfiction 
	// https://mrdaxxonford.tumblr.com/post/139839107484/little-things 
// https://mrdaxxonford.tumblr.com/archive?startpage=1?pagesatonce=10?find=/tagged/my-stuff?ezastumblrscrape?scrapemode=xml?lastpage=9?sitemap=1?xmlcount=9?story?usemobile?tagscrape - managed to change the page colors, wtf. These are /mobile posts! Whatever, makes no difference to a plain text file. 
	// Massive slowdown from /tagged/pinecest - possibly re-applying CSS from every single /post across 90 /pages? 
	// Ah fuck, virulentmalapropisms does it too. For "links only!" What the fuck? 
	// Tried reversion testing - it's not just tumblr fucking with me, it -is- something in my code. Uuugh. 
	// Forgot to make "story" stuff conditional. 
	// mrdaxxonford still has different style settings, what the fuck. Oh - because I'm doing ?story and ?tagscrape, so it's not /mobile. Maybe. 
	// No: it's already grabbing /mobile. WTF. 
// It would be more correct to add a 'post-by-post but just for this tag' link after the page count has been found. That'd allow breaking out in 100-page chunks. 
// Images 'under the cut' don't work in /tagscrape. Because fuck me. 
	// http://slightly-gay-pogohammer.tumblr.com/archive?startpage=1?pagesatonce=10?find=/tagged/nsfw-under-cut?ezastumblrscrape?scrapemode=xml?autostart?lastpage=3?sitemap=1?xmlcount=3?tagscrape
// http://manicpeixesdreamgirl.tumblr.com/archive?startpage=1?pagesatonce=10?ezastumblrscrape?scrapemode=xml?lastpage=478?sitemap=5?story?usemobile?usemobile - breaks at 35% even with ?usemobile. I really need fetches to fail safely. Either fuck with .catch() (where does that even go, for a promise.then chain?) or make absolutely goddamn sure every URL begins with window.location.hostname. Oh, and sitemap=11 gets 0% done. 
	// Sloppy fix: change every fetch to relative URLs. 
	// Function for relativizing URLs: replace https:// and http:// both. Replacing window.location.domain or whatever, including ".com/". Add a slash to front of remaining URL. 
// Really fuckin' wish I'd figured out dashboard blogs now. 
// /amp is a thing, like /embed or /mobile. Quickly use it. 
// http://diediedie3344-deactivated-204913.tumblr.com/archive still works what the fuck. That would've saved so much extra shit. 
// https://unwrapping.tumblr.com/post/126390533972/handy-tumblr-urls - son of a bitch I would've loved to know this a year ago.
	// Archive, filter by... Photo — http://<site>.tumblr.com/archive/filter-by/photo 
	// Posts (from your blog) — https://<www>.tumblr.com/blog/<site> - oh, just 'dashboard mode.' Or 'side mode' or whatever it's called. 
	// Day (posts by date) — http://<site>.tumblr.com/day/<year>/<month>/<day> example: http://unwrapping.tumblr.com/day/2015/04/01 
// Tumblrs below 10 pages don't find pages correctly anymore. 
	// Also, seeing a lot of "last page is between 9 and 100" or "99 and 1000." That first check is 1, 10, 100, etc. I didn't fucking ask for 99. What the hell? 
// Eats CPU cycles on the landing page. Not my fault, Tumblr just bites now. /archive is a fat bitch. 
	// Alternate standard landing page? /amp, maybe. /mobile. ?test works, obviously. &test gets eaten. 
	// /mobile works, but might not be 100.0% reliable. I think I remember some sites not having it? Might be thinking of sitemap.xml. 
	// Whoops, no longer has "Archive" in the title, so textfile filenames would be different. 
	// standard_landing_page should probably be a variable. I've already messed up replacing it in the several odd spots where it's relevant.
	// /archive/1999/1? Any year before Tumblr existed. "No posts yet."  
// Consider checking the /amp version of /post pages. Should be standardized. Might still show lingering NSFW content. Again: fuck this website. 
	// If anyone from Tumblr is reading this, fuck you personally. I don't care what you do there or when you joined. Your company has destroyed art and culture on a grand scale. Millions of hours of artistic effort, the archives of deceased creators, and countless human interactions, gone. No excuse can matter. 
// Would <wbr> break text links? I could insert it all over long-ass strings, but I don't know if it screws up how e.g. DownThemAll interprets those strings. HREF is unchanged. 
// The world needs an Eza's Twitter Scrape. I just hope to god someone besides me has done it already. 
// Why in the name of god is there no way to decode entity references in standard fucking Javascript? Did they think nobody would want to?! 
// I should probably detect rate-limiting and keep retrying. Each page on a full-site scrape gets its own div, right? 
// Always fetching /post/12345/words-etc/amp is an option. /page/2/amp doesn't work. So reliable tags are not trivial. Dangit. 

// Added first-page special case to /mobile handling, and there's some code duplication going on. Double-check what's going on for both scrapers. 
// Should offsite images read (Waiting for offsite image)? I'd like to know at a glance whether a Tumblr image failed or some shit from Minus didn't load. 
// "Programming is like this amazing puzzle game, where the puzzles are created by your own stupidity."
// Maybe just... like... ?exclude=/tagged/onepiece... to do a full rip of /tagged/onepiece and load that into page_dupe_hash? Or ?include maybe? Logical and & or?
	// Or some kind of text-file combiner, not because that's convenient to anyone else, but because I've been saving a shitload of text-files. 

// Re-editing this from whatever's on GreasyFork, in Linux. November 2020. 

	// Changes since last version:
// In ?scrapemode=www, moved each /post's permalink discovery ahead of de-duplication. 
// Added navigation controls and timing information for www.tumblr.com/tagged mode. 
// Refactored onclick code for "immediate" image size changes. 
// Added navigation to specific dates for www.tumblr.com/tagged mode. 
	// 2020 deep hibernation return:
// Fixed the goddamn titles. Back to simplicity. 
// Included max sizes for new-format image URLs, but de-duplication needs more than standardization. 







// ------------------------------------ Global variables ------------------------------------ //







var options_map = new Object(); 		// Associative array for ?key=value pairs in URL. 

	// Here's the URL options currently used. Scalars are at their default values; boolean flags are all set to false. 
options_map[ "startpage" ] = 1; 		// Page to start at when browsing images. 
options_map[ "pagesatonce" ] = 10; 		// How many Tumblr pages to browse images from at once. 
options_map[ "thumbnails" ] = false; 		// For browsing mode, 240px-wide images vs. full-size. 
options_map[ "find" ] = ""; 		// What goes after the Tumblr URL. E.g. /tagged/art or /chrono. 

// Only two hard problems in computer science...
var page_dupe_hash = {}; 		// Dump each scraped page into this, then test against each page - to remove inter-page duplicates (i.e., same shit on every page) 

// Using a global variable for this (defined later) instead of repeatedly defining it or trying to pass within the page using black magic. 
var site_and_tags = ""; 		// To be filled with e.g. http: + // + example.tumblr.com + /tagged/sherlock once options_map.find is parsed from the URL 

// Global list of post_id numbers that have been used, corresponding to span IDs based on them
var posts_placed = new Array;

// Global list of posts to be scraped, for use by sitemap.xml-based scrape function.
var post_urls = new Array; 

// "Safe" landing page - Tumblr-standard, no theme, ideally few images. 
// This can't be /archive or else immediate resizing doesn't work on blogs. But /mobile doesn't work for the new www mode. /archive/amp is consistent but "Not found." 
// Blog CSS still won't work right with /archive/amp are you kidding me. 
var standard_landing_page = "/mobile/page/amp"; 






// ------------------------------------ Script start, general setup ------------------------------------ //





// First, determine regime: Bare image? Scrape page? Normal page with a scrape link? 
if( window.location.href.indexOf( 'media.tumblr.com/' ) > -1 ) { maximize_image_size();  } 		// If it's an image, attempt to resize it
// First, determine if we're loading many pages and listing/embedding them, or if we're just adding a convenient button to that functionality. 
else if( window.location.href.indexOf( 'ezastumblrscrape' ) > -1 ) {		// If we're scraping pages:
		// Replace Tumblr-standard Archive page with our own custom nonsense
	let original_title = document.title; 
	document.head.innerHTML = "";		// Delete CSS. We'll start with a blank page.
//	document.title = ''; 		// Clear title. Script adds stuff 'to the end,' as before. Fetch below eventually fixes the front. 
	// Since we're avoiding /archive (due to content security policy interfering with image-resize gimmicks) we need to get the page title from elsewhere. (Oh, and favicon.) 
/*
	fetch( "/archive" ).then( r => r.text() ).then( text => { 		// This is now the wrong title for www stuff. Mmmeh. 
		title = text.split( '<title>' )[1].split( '</title>' )[0]; 
//		title = text.split( '<title' )[1].split( '>' )[1].split( '</title' )[0];  		// Attempted kludge
		document.title = window.location.hostname + " - " + htmlDecode( title ) + " " + document.title; 		// Beat the race condition by losing. (If we're first, we append a blank.) 
		favicon = text.split( '<link' ).map( x => x.split( '>' )[0] ).filter( x => x.indexOf( 'shortcut icon' ) > 0 )[0];
		document.head.innerHTML += '<link ' + favicon + '>'; 
		// The favicon fails sometimes - even on blogs where it otherwise works - even on pages where reloading makes it work. I do not understand how. 
	} )
*/
	// Fuck it:
	document.title = window.location.hostname + " - " + original_title.replace( /[\"\\\/]/g, '' ) + " " + document.title; 

	document.body.outerHTML = "<div id='maindiv'><div id='fetchdiv'></div></div><div id='bottom_controls_div'></div>"; 		// This is our page. Top stuff, content, bottom stuff. 
	let css_block = "<style> "; 		// Let's break up the CSS for readability and easier editing, here. 
	css_block += "body { background-color: #DDD; } "; 		// Light grey BG to make image boundaries more obvious 
	css_block += "h1,h2,h3{ display: inline; } ";
	css_block += ".fixed-width img{ width: 240px; } ";
	css_block += ".fit-width img{ max-width: 100%; } ";
	css_block += ".fixed-height img{ height: 240px; max-width:100%; } "; 		// Max-width for weirdly common 1px-tall photoset footers. Ultrawide images get their own row. 
	css_block += ".fit-height img{ max-height: 100%; } ";
	css_block += ".smart-fit img{ max-width: 100%; max-height: 100%; } ";		// Might ditch the individual fit modes. This isn't Pixiv Fixiv. (No reason not to have them?) 
	css_block += ".pagelink{ display:none; } "; 
	css_block += ".showpagelinks .pagelink{ display:inline; } "; 
	// Text blocks over images hacked together after https://www.designlabthemes.com/text-over-image-with-css/ - CSS is the devil 
	css_block += ".container { position: relative; padding: 0; margin: 0; } "; 		// Holds an image and the on-hover permalink
	css_block += ".postlink { position: absolute; width: 100%; left: 0; bottom: 0; opacity: 0; background: rgba(255,255,255,0.7); } "; 		// Permalink across bottom
	css_block += ".container:hover .postlink { opacity: 1; } "; 
	css_block += ".hidden { display: none; } "; 
	css_block += "a.interactive{ color:green; } "; 		// For links that don't go anywhere, e.g. 'toggle image size' - signalling. 
	css_block += "</style>"; 
	document.body.innerHTML += css_block; 		// Has to go in all at once or the browser "helpfully" closes the style tag upon evaluation 	
	var mydiv = document.getElementById( "maindiv" ); 		// I apologize for the generic names. This script used to be a lot simpler. 

		// Identify options in URL (in the form of ?key=value pairs) 
	var key_value_array = window.location.href.split( '?' ); 		// Knowing how to do it the hard way is less impressive than knowing how not to do it the hard way. 
	key_value_array.shift(); 		// The first element will be the site URL. Durrrr. 
	for( dollarsign of key_value_array ) { 		// forEach( key_value_array ), including clumsy homage to $_ 
		var this_pair = dollarsign.split( '=' ); 		// Split key=value into [key,value] (or sometimes just [key])
		if( this_pair.length < 2 ) { this_pair.push( true ); } 		// If there's no value for this key, make its value boolean True 
		if( this_pair[1] == "false " ) { this_pair[1] = false; } 		// If the value is the string "false" then make it False - note fun with 1-ordinal "length" and 0-ordinal array[element]. 
			else if( !isNaN( parseInt( this_pair[1] ) ) ) { this_pair[1] = parseInt( this_pair[1] ); } 		// If the value string looks like a number, make it a number
		options_map[ this_pair[0] ] = this_pair[1]; 		// options_map.key = value 
	}
	if( options_map.find[ options_map.find.length - 1 ] == "/" ) { options_map.find = options_map.find.substring( 0, options_map.find.length - 1 ); } 		// Prevents .com//page/2
//	if( options_map.find.indexOf( '/chrono' ) > 0 ) { options_map.chrono = true; } else { options_map.chrono = false; } 		// False case, to avoid unexpected persistence? Hm. 

		// Convert old URL options to new key-value pairs 
	if( options_map[ "scrapewholesite" ] ) { options_map.scrapemode = "scrapewholesite"; options_map.scrapewholesite = false; } 
	if( options_map[ "everypost" ] ) { options_map.scrapemode = "everypost"; options_map.everypost = false; } 
	if( options_map[ "thumbnails" ] == true ) { options_map.thumbnails = "fixed-width"; } 		// Replace the original valueless key with the default value 
	if( options_map[ "notraw" ] ) 		{ options_map.maxres = "1280"; options_map.notraw = false; } 
	if( options_map[ "usesmall" ] ) 	{ options_map.maxres = "400"; options_map.usesmall = false; } 

	document.body.className = options_map.thumbnails; 		// E.g. fixed-width, fixed-height, as matches the CSS. Persistent thumbnail options. Failsafe = original size.
	// Oh yeah, we have to do this -after- options_map.find is defined:
	site_and_tags = window.location.protocol + "//" + window.location.hostname + options_map.find; 		// e.g. http: + // + example.tumblr.com + /tagged/sherlock

	// Grab an example page so that duplicate-removal hides whatever junk is on every single page  
	// This remains buggy due to asynchronicity. It's a race condition where the results are, at worst, mildly annoying. 
	// Previous notes mention jeffmacanolinsfw.tumblr.com for some reason. 
	// Can just .then this function? 
	// Can I create cookies? That'd work fine. On load, grab cookies for this site and for this script, use /page/1 crap. 
	if( options_map.startpage != 1 && options_map.scrapemode != "scrapewholesite" ) 	// Not on first page, so on-every-page stuff appears somewhere
		{ exclude_content_example( site_and_tags + '/page/1' );	 } 		// Since this doesn't happen on page 1, let's use page 1. Low pages are faster somehow. . 

		// Add tags to title, for archival and identification purposes
	document.title += options_map.find.split('/').join(' '); 		// E.g. /tagged/example/chrono -> "tagged example chrono" 

	// In Chrome, /archive pages monkey-patch and overwrite Promise.all and Promise.resolve. 
	// Clunky solution to clunky problem: grab the default property from a fresh iframe.
	// Big thanks to inu-no-policeman for the iframe-based solution. Prototypes were not helpful. 
	var iframe = document.createElement( 'iframe' );  
	document.body.appendChild( iframe ); 
	window['Promise'] = iframe.contentWindow['Promise'];
	document.body.removeChild( iframe );

	mydiv.innerHTML = "Not all images are guaranteed to appear.<br>"; 		// Thanks to JS's wacky accomodating nature, mydiv is global despite appearing in an if-else block. 

		// Go to image browser or link scraper according to URL options. 
	switch( options_map.scrapemode ) {
 		case "scrapewholesite": scrape_whole_tumblr(); break;
		case "xml" : scrape_sitemap(); break; 
		case "everypost": setTimeout( new_embedded_display, 500 ); break; 		// Slight delay increases odds of exclude_content_example actually fucking working
		case "www": scrape_www_tagged(); break; 		// www.tumblr.com/tagged, and eventually your dashboard, maybe. 
		default: scrape_tumblr_pages();  		// Sensible delays do not work on the original image browser. Shrug. 
	}

} else { 		// If it's just a normal Tumblr page, add a link to the appropriate /ezastumblrscrape URL 

	// Add link(s) to the standard "+Follow / Dashboard" nonsense. Before +Follow, I think - to avoid messing with users' muscle memory. 
	// This is currently beyond my ability to dick with JS through a script in a plugin. Let's kludge it for immediate usability. 
	// kludge by Ivan - http://userscripts-mirror.org/scripts/review/65725.html 

		// Preserve /tagged/tag/chrono, etc. Also preserve http: vs https: via "location.protocol". 
	var find = window.location.pathname; 
	if( find.indexOf( "/page/chrono" ) <= 0 ) { 		// Basically checking for posts /tagged/page, thanks to Detective-Pony. Don't even ask. 
		if( find.lastIndexOf( "/page/" ) >= 0 ) { find = find.substring( 0, find.lastIndexOf( "/page/" ) ); } 		// Don't include e.g. /page/2. We'll add that ourselves. 
		if( find.lastIndexOf( "/post/" ) >= 0 ) { find = find.substring( 0, find.lastIndexOf( "/post" ) ); } 
		if( find.lastIndexOf( "/archive" ) >= 0 ) { find = find.substring( 0, find.lastIndexOf( "/archive" ) ); } 
		// On individual posts (and the /archive page), the link should scrape the whole site. 
	}
	var url = window.location.protocol + "//" + window.location.hostname + standard_landing_page +  "?ezastumblrscrape?scrapewholesite?find=" + find; 
	if( window.location.host == "www.tumblr.com" ) { url += "?scrapemode=www?thumbnails=fixed-width"; url = url.replace( "?scrapewholesite", "" ); } 

	// "Don't clean this up. It's not permanent."
	// Fuck it, it works and it's fragile. Just boost its z-index so it stops getting covered. 
	var scrape_button = document.createElement("a");
	scrape_button.setAttribute( "style", "position: absolute; top: 26px; right: 1px; padding: 2px 0 0; width: 50px; height: 18px; display: block; overflow: hidden; -moz-border-radius: 3px; background: #777; color: #fff; font-size: 8pt; text-decoration: none; font-weight: bold; text-align: center; line-height: 12pt; z-index: 1000; " );
	scrape_button.setAttribute("href", url);
	scrape_button.innerHTML = "Scrape";
	var body_ref = document.getElementsByTagName("body")[0];
	body_ref.appendChild(scrape_button);
		// Pages where the button gets split (i.e. clicking top half only redirects tiny corner iframe) are probably loading this script separately in the iframe. 
		// Which means you'd need to redirect the window instead of just linking. Bluh. 

	// Greasemonkey supports user commands through its add-on menu! Thus: no more manually typing /archive?ezastumblrscrape?scrapewholesite on uncooperative blogs.
	GM_registerMenuCommand( "Scrape whole Tumblr blog", go_to_scrapewholesite );
	// If a page is missing (post deleted, blog deleted, name changed) a reblog can often be found based on the URL
	// Ugh, these should only appear on /post/ pages. 
	// Naming these commands is hard. Both look for reblogs, but one is for if the post you're on is already a reblog. 
	// Maybe "because this Tumblr changed names / moved / is missing" versus "because this reblog got deleted?" Newbs might not know either way. 
		// "As though" this blog is missing / this post is missing? 
		// "Search for this Tumblr under a different name" versus "search for other blogs that've reblogged this?"
	GM_registerMenuCommand( "Google for reblogs of this original Tumblr post", google_for_reblogs );
	GM_registerMenuCommand( "Google for other instances of this Tumblr reblog", google_for_reblogs_other ); 		// two hard problems 

//	if( window.location.href.indexOf( '?browse' ) > -1 ) { browse_this_page(); } 		// Experimental single-page scrape mode - DEFINITELY not guaranteed to stay 
}

function go_to_scrapewholesite() { 
//	let redirect = window.location.protocol + "//" + window.location.hostname + "/archive?ezastumblrscrape?scrapewholesite?find=" + window.location.pathname;
	let redirect = window.location.protocol + "//" + window.location.hostname + standard_landing_page 
		+ "?ezastumblrscrape?scrapewholesite?find=" + window.location.pathname;  
	window.location.href = redirect; 
}

function google_for_reblogs() {
	let blog_name = window.location.href.split('/')[2].split('.')[0]; 		// e.g. http//example.tumblr.com -> example 
	let content = window.location.href.split('/').pop(); 		// e.g. http//example.tumblr.com/post/12345/hey-i-drew-this -> hey-i-drew-this
	let redirect = "https://google.com/search?q=tumblr " + blog_name + " " + content; 
	window.location.href = redirect; 
}

function google_for_reblogs_other() {
	let content = window.location.href.split('/').pop().split('-'); 		// e.g. http//example.tumblr.com/post/12345/hey-i-drew-this -> hey,i,drew,this
	let blog_name = content.shift(); 		// e.g. examplename-hey-i-drew-this -> examplename
	content = content.join('-'); 
	let redirect = "https://google.com/search?q=tumblr " + blog_name + " " + content; 
	window.location.href = redirect; 
}









// ------------------------------------ Whole-site scraper for use with DownThemAll ------------------------------------ //









// Monolithic scrape-whole-site function, recreating the original intent (before I added pages and made it a glorified multipage image browser) 
function scrape_whole_tumblr() {
//	console.log( page_dupe_hash ); 
	var highest_known_page = 0;
	options_map.startpage = 1; 		// Reset to default, because other values do goofy things to the image-browsing links below 

	// Link to image-viewing version, preserving current tags
	mydiv.innerHTML += "<br><h1><a id='browse10' href='" + options_url( {scrapemode:'pagebrowser', thumbnails:true} ) +"'>Browse images (10 pages at once)</a> </h1>"; 
	mydiv.innerHTML += "<h3><a id='browse5' href='" + options_url( {scrapemode:'pagebrowser', thumbnails:true, pagesatonce:5} ) + "'>(5 pages at once)</a> </h3>"; 
	mydiv.innerHTML += "<h3><a id='browse1' href='" + options_url( {scrapemode:'pagebrowser', thumbnails:true, pagesatonce:1} ) + "'>(1 page at once)</a></h3><br><br>"; 

	mydiv.innerHTML += "<a id='exp10' href='" + options_url( {scrapemode:'everypost', thumbnails:true, pagesatonce:10} ) + "'>Experimental fetch-every-post image browser (10 pages at once)</a> "; 
	mydiv.innerHTML += "<a id='exp5' href='" + options_url( {scrapemode:'everypost', thumbnails:true, pagesatonce: 5} ) + "'>(5 pages at once)</a> "; 
	mydiv.innerHTML += "<a id='exp1' href='" + options_url( {scrapemode:'everypost', thumbnails:true, pagesatonce: 1} ) + "'>(1 page at once)</a><br><br>";

	mydiv.innerHTML += "<a id='xml_all' href='" + options_url( {scrapemode:'xml' } ) + "'>Post-by-post images and text</a> <-- New option for saving stories<br><br>";  

	// Find out how many pages we need to scrape.
	if( isNaN( options_map.lastpage ) ) {
		// Find upper bound in a small number of fetches. Ideally we'd skip this - some themes list e.g. "Page 1 of 24." I think that requires back-end cooperation. 
		mydiv.innerHTML += "Finding out how many pages are in <b>" + site_and_tags.substring( site_and_tags.indexOf( '/' ) + 2 ) + "</b>:<br><br>"; 

		// Returns page number if there's no Next link, or negative page number if there is a Next link. 
		// Only for use on /mobile pages; relies on Tumblr's shitty standard theme 
		function test_next_page( body ) {
			var link_index = body.indexOf( 'rel="canonical"' ); 		// <link rel="canonical" href="http://shadygalaxies.tumblr.com/page/100" /> 
			var page_index = body.indexOf( '/page/', link_index ); 
			var terminator_index = body.indexOf( '"', page_index ); 
			var this_page = parseInt( body.substring( page_index+6, terminator_index ) ); 
			if( body.indexOf( '>next<' ) > 0 ) { return -this_page; } else { return this_page }
		}

		// Generates an array of length "steps" between given boundaries - or near enough, for sanity's sake 
		function array_between_bounds( lower_bound, upper_bound, steps ) { 
			if( lower_bound > upper_bound ) { 		// Swap if out-of-order. 
				var temp = lower_bound; lower_bound = upper_bound, upper_bound = temp; 
			}
			var bound_range = upper_bound - lower_bound;
			if( steps > bound_range ) { steps = bound_range; } 		// Steps <= bound_range, but steps > 1 to avoid division by zero:
			var pages_per_test = parseInt( bound_range / steps ); 		// Steps-1 here, so first element is lower_bound & last is upper_bound. Off-by-one errors, whee... 
			var range = Array( steps ) 
				.fill( lower_bound ) 
				.map( (value,index) => value += index * pages_per_test );
			range.push( upper_bound );
			return range;
		}

		// DEBUG
//		site_and_tags = 'https://www.tumblr.com/safe-mode?url=http://shittyhorsey.tumblr.com'; 

		// Given a (presumably sorted) list of page numbers, find the last that exists and the first that doesn't exist. 
		function find_reasonable_bound( test_array ) {
			return Promise.all( test_array.map( pagenum => fetch( site_and_tags + '/page/' + pagenum + '/mobile', { credentials: 'include' } ) ) ) 
				.then( responses => Promise.all( responses.map( response => response.text() ) ) ) 
				.then( pages => pages.map( page => test_next_page( page ) ) ) 
				.then( numbers => {
					var lower_index = -1;
					numbers.forEach( (value,index) => { if( value < 0 ) { lower_index++; } } ); 	// Count the negative numbers (i.e., count the pages with known content) 
					if( lower_index < 0 ) { lower_index = 0; } 
					var bounds = [ Math.abs(numbers[lower_index]), numbers[lower_index+1] ] 
					mydiv.innerHTML += "Last page is between " + bounds[0] + " and " + bounds[1] + ".<br>";
					return bounds; 
				} )
		}

		// Repeatedly narrow down how many pages we're talking about; find a reasonable "last" page 
		find_reasonable_bound( [2, 10, 100, 1000, 10000, 100000] ) 		// Are we talking a couple pages, or a shitload of pages?
			.then( pair => find_reasonable_bound( array_between_bounds( pair[0], pair[1], 10 ) ) ) 		// Narrow it down. Fewer rounds of more fetches works best.
			.then( pair => find_reasonable_bound( array_between_bounds( pair[0], pair[1], 10 ) ) ) 		// Time is round count, fetches add up, selectivity is fetches x fetches. 
			// Quit fine-tuning numbers and just conditional in some more testing for wide ranges. 
			.then( pair => { if( pair[1] - pair[0]  > 50 ) { return find_reasonable_bound( array_between_bounds( pair[0], pair[1], 10 ) ) } else { return pair; } } ) 
			.then( pair => { if( pair[1] - pair[0]  > 50 ) { return find_reasonable_bound( array_between_bounds( pair[0], pair[1], 10 ) ) } else { return pair; } } ) 
			.then( pair => { 
				options_map.lastpage = pair[1]; 

				document.getElementById( 'browse10' ).href += "?lastpage=" + options_map.lastpage; 		// Add last-page indicator to Browse Images link
				document.getElementById( 'browse5' ).href += "?lastpage=" + options_map.lastpage; 			// ... and the 5-pages-at-once link.
				document.getElementById( 'browse1' ).href += "?lastpage=" + options_map.lastpage; 			// ... and the 1-page-at-onces link. 
				document.getElementById( 'exp10' ).href += "?lastpage=" + options_map.lastpage; 			// ... and this fetch-every-post link. 
				document.getElementById( 'exp5' ).href += "?lastpage=" + options_map.lastpage; 				// ... and this fetch-every-post link. 
				document.getElementById( 'exp1' ).href += "?lastpage=" + options_map.lastpage; 				// ... and this fetch-every-post link. 
				document.getElementById( 'xml_all' ).href += "?lastpage=" + options_map.lastpage; 			// ... and this XML post-by-post link, why not. 

				start_scraping_button(); 
			} );  
	}
	else { 		// If we're given the highest page by the URL, just use that
		start_scraping_button(); 
	} 

	// Add "Scrape" button to the page. This will grab images and links from many pages and list them page-by-page. 
	function start_scraping_button() { 
		if( options_map.grabrange ) { 		// If we're only grabbing a 1000-page block from a huge-ass tumblr:
			mydiv.innerHTML += "<br>This will grab 1000 pages starting at <b>" + options_map.grabrange + "</b>.<br><br>";
		} else { 		// If we really are describing the last page:
			mydiv.innerHTML += "<br>Last page is <b>" + options_map.lastpage + "</b> or lower. ";
			mydiv.innerHTML += "<a href=" + options_url( {lastpage: false} ) + ">Find page count again?</a> <br><br>"; 
		}

		if( options_map.lastpage > 1500 && !options_map.grabrange ) { 		// If we need to link to 1000-page blocks, and aren't currently inside one: 
			for( let x = 1; x < options_map.lastpage; x += 1000 ) { 		// For every 1000 pages...
				let decade_url = window.location.href + "?grabrange=" + x + "?lastpage=" + options_map.lastpage; 
				mydiv.innerHTML += "<a href='" + decade_url + "'>Pages " + x + "-" + (x+999) + "</a><br>"; 		// ... link a range of 1000 pages. 
			}
		}

			// Add button to scrape every page, one after another. 
			// Buttons within GreaseMonkey are a huge pain in the ass. I stole this from stackoverflow.com/questions/6480082/ - thanks, Brock Adams. 
		var button = document.createElement ('div');
		button.innerHTML = '<button id="myButton" type="button">Find image links from all pages</button>'; 
		button.setAttribute ( 'id', 'scrape_button' );		// I'm really not sure why this id and the above HTML id aren't the same property. 
		document.body.appendChild ( button ); 		// Add button (at the end is fine) 
		document.getElementById ("myButton").addEventListener ( "click", scrape_all_pages, false ); 		// Activate button - when clicked, it triggers scrape_all_pages() 

		if( options_map.autostart ) { document.getElementById ("myButton").click(); } 		// Getting tired of clicking on every reload - debug-ish 
		if( options_map.lastpage <= 26 ) { document.getElementById ("myButton").click(); } 		// Automatic fetch (original behavior!) for a single round 
	} 
}



function scrape_all_pages() {		// Example code implies that this function /can/ take a parameter via the event listener, but I'm not sure how. 
	var button = document.getElementById( "scrape_button" ); 			// First, remove the button. There's no reason it should be clickable twice. 
	button.parentNode.removeChild( button ); 		// The DOM can only remove elements from a higher level. "Elements can't commit suicide, but infanticide is permitted." 

	mydiv.innerHTML += "Scraping page: <div id='pagecounter'></div><div id='afterpagecounter'><br></div><br>";		// This makes it easier to view progress,

	// Create divs for all pages' content, allowing asynchronous AJAX fetches
	var x = 1; 
	var div_end_page = options_map.lastpage;
	if( !isNaN( options_map.grabrange ) ) { 		// If grabbing 1000 pages from the middle of 10,000, don't create 0..10,000 divs 
		x = options_map.grabrange; 
		div_end_page = x + 1000; 		// Should be +999, but whatever, no harm in tiny overshoot 
	}
	for( ; x <= div_end_page; x++ ) { 
		var siteurl = site_and_tags + "/page/" + x; 
		if( options_map.usemobile ) { siteurl += "/mobile"; } 		// If ?usemobile is flagged, scrape the mobile version.
		if( x == 1 && options_map.usemobile ) { siteurl = site_and_tags + "/mobile"; } 		// Hacky fix for redirect from example.tumblr.com/page/1/anything -> example.tumblr.com

		var new_div = document.createElement( 'div' );
		new_div.id = '' + x; 
		document.body.appendChild( new_div );
	}

	// Fetch all pages with content on them
	var page_counter_div = document.getElementById( 'pagecounter' ); 		// Probably minor, but over thousands of laggy page updates, I'll take any optimization. 
	pagecounter.innerHTML = "" + 1;
	var begin_page = 1; 
	var end_page = options_map.lastpage; 
	if( !isNaN( options_map.grabrange ) ) { 		// If a range is defined, grab only 1000 pages starting there 
		begin_page = options_map.grabrange;
		end_page = options_map.grabrange + 999; 		// NOT plus 1000. Stop making that mistake. First page + 999 = 1000 total. 
		if( end_page > options_map.lastpage ) { end_page = options_map.lastpage; } 		// Kludge 
		document.title += " " + (parseInt( begin_page / 1000 ) + 1);		// Change page title to indicate which block of pages we're saving
	}


	// Generate array of URL/pagenum pair-arrays 
	url_index_array = new Array;
	for( var x = begin_page; x <= end_page; x++ ) { 
		var siteurl = site_and_tags + "/page/" + x; 
		if( options_map.usemobile ) { siteurl += "/mobile"; } 		// If ?usemobile is flagged, scrape the mobile version. No theme shenanigans... but also no photosets. Sigh. 
		if( x == 1 && options_map.usemobile ) { siteurl = site_and_tags + "/mobile"; } 		// Hacky fix for redirect from example.tumblr.com/page/1/anything -> example.tumblr.com
		url_index_array.push( [siteurl, x] ); 
	}

	// Fetch, scrape, and display all URLs. Uses promises to work in parallel and promise.all to limit speed and memory (mostly for reliability's sake). 
	// Consider privileging first page with single-element fetch, to increase apparent responsiveness. Doherty threshold for frustration is 400ms. 
	var simultaneous_fetches = 25;
	var chain = Promise.resolve(0); 		// Empty promise so we can use "then" 

	var order_array = [1]; 		// We want to show the first page immediately, and this is a callback rat's-nest, so let's make an array of how many pages to take each round
	for( var x = 1; x < url_index_array.length; x += simultaneous_fetches ) { 		// E.g. [1, simultaneous_fetchs, s_f, s_f, s_f, whatever's left] 
		if( url_index_array.length - x > simultaneous_fetches ) { order_array.push( simultaneous_fetches ); } else { order_array.push( url_index_array.length - x ); } 
	}

	order_array.forEach( (how_many) => {
		chain = chain.then( s => {
			var subarray = url_index_array.splice( 0, how_many );  		// Shift a reasonable number of elements into separate array, for partial array.map  
			return Promise.all( subarray.map( page => 
				Promise.all( [ fetch( page[0], { credentials: 'include' } ).then( s => s.text() ), 	page[1], 	page[0] ] ) 		// Return [ body of page, page number, page URL ] 
			) ) 
		} )
		.then( responses => responses.map( s => { 		// Scrape URLs for links and images, display on page 
			var pagenum = s[1]; 
			var page_url = s[2];
			var url_array = soft_scrape_page_promise( s[0] ) 		// Surprise, this is a promise now 
				.then( urls => { 
					// Sort #link URLs to appear first, because we don't do that in soft-scrape anymore
					urls.sort( (a,b) => -a.indexOf( "#link" ) ); 		// Strings containing "#link" go before others - return +1 if not found in 'a.' Should be stable. 

					// Print URLs so DownThemAll (or similar) can grab them
					var bulk_string = "<br><a href='" + page_url + "'>" + page_url + "</a><br>"; 		// A digest, so we can update innerHTML just once per div
					// DEBUG-ish - on theory that 1000-page-tall scraping/rendering fucks my VRAM
					if( options_map.smalltext ) { bulk_string = "<p style='font-size:1px'>" + bulk_string; } 		// If ?smalltext flag is set, render text unusably small, for esoteric reasons
					urls.forEach( (value,index,array) => { 
						if( options_map.plaintext ) { 
							bulk_string += value + '<br>';
						} else {
							bulk_string += '<a href ="' + value + '">' + value + '</a><br>'; 
						}
					} )
					document.getElementById( '' + pagenum ).innerHTML = bulk_string; 
					if( parseInt( pagecounter.innerHTML ) < pagenum ) { pagecounter.innerHTML = "" + pagenum; } 		// Increment pagecounter (where sensible) 
				} );
			} )
		) 
	} )

	chain = chain.then( s => { 
		document.getElementById( 'afterpagecounter' ).innerHTML = "Done. Use DownThemAll (or a similar plugin) to grab all these links."; 

		// Divulge contents of page_dupe_hash to check for common tags 
		// Ugh, I'm going to have to turn this from an associative array into an array-of-arrays if I want to sort it. 
		let tag_overview = "<br>" + "Tag overview: " + "<br>"; 
		let score_tag_list = new Array; 		// This will hold an array of arrays so we can sort this associative array by its values. Wheee.
		for( let url in page_dupe_hash ) { 
			if( url.indexOf( '/tagged/' ) > 0 			// If it's a tag URL...
				&& page_dupe_hash[ url ] > 1 	// and non-unique...
				&& url.indexOf( '/page/' ) < 0 		// and not a page link...
				&& url.indexOf( '?og' ) < 0 			// and not an opengraph link...
				&& url.indexOf( '?ezas' ) < 0 		// and not this script, wtf... 
			) { 		// So if it's a TAG, in other words...
				score_tag_list.push( [ page_dupe_hash[ url ], url ] ); 		// ... store [ number of times seen, tag URL ] for sorting. 
			}
		} 
		score_tag_list.sort( (a,b) => a[0] > b[0] ); 		// Ascending order, now - most common tags at the very bottom, for easier access 
		score_tag_list.map( pair => {
			pair[1] = pair[1].replace( '/chrono', '' ); 		// Remove /chrono from sites that append it automatically, since it breaks the 'N pages' autostart links. (/chrono/ might fail.) 
			var this_tag = pair[1].split('/').pop(); 		// e.g. example.tumblr.com/tagged/my-art -> my-art
			//if( this_tag == '' ) { let this_tag = pair[1].split('/').pop(); } 		// Trailing slash screws this up, so get the second-to-last thing instead
			var scrape_link = options_url( {find: '/tagged/'+this_tag, lastpage: false, grabrange: false, autostart: true} ); 		// Direct link to ?scrapewholesite for e.g. /tagged/my-art 
			tag_overview += "<br><a href='" + scrape_link + "'>" + pair[0] +  " posts</a>:\t" + "<a href='" + pair[1] + "'>" + pair[1] + "</a>"; 
		} )
		document.body.innerHTML += tag_overview;

	} ) 
}









// ------------------------------------ Multi-page scraper with embedded images ------------------------------------ //









function scrape_tumblr_pages() { 
	// Grab an empty page so that duplicate-removal hides whatever junk is on every single page 
	// This is DEBUG-ish. It might be slow, barring caching. It might not work due to asynchrony. It could block actual content thanks to 'my best posts' sidebars. 

	if( isNaN( parseInt( options_map.startpage ) ) || options_map.startpage <= 1 ) { options_map.startpage = 1; } 		// Sanity check

	mydiv.innerHTML += "<br>" + html_previous_next_navigation() + "<br>"; 
	document.getElementById("bottom_controls_div").innerHTML += "<br>" + html_page_count_navigation() + "<br>" + html_previous_next_navigation(); 

	mydiv.innerHTML += "<br>" + html_ezastumblrscrape_options() + "<br>"; 

	mydiv.innerHTML += "<br>" +image_size_options(); 

	mydiv.innerHTML += "<br><br>" + image_resolution_options(); 

	// Fill an array with the page URLs to be scraped (and create per-page divs while we're at it)
	var pages = new Array( parseInt( options_map.pagesatonce ) ) 
		.fill( parseInt( options_map.startpage ) )
		.map( (value,index) => value+index ); 
	pages.forEach( pagenum => { 
			mydiv.innerHTML += "<hr><div id='" + pagenum + "'><b>Page " + pagenum + " </b></div>"; 
	} )

	pages.map( pagenum => { 
		var siteurl = site_and_tags + "/page/" + pagenum; 		// example.tumblr.com/page/startpage, startpage+1, startpage+2, etc. 
		if( options_map.usemobile ) { siteurl += "/mobile"; } 		// If ?usemobile is flagged, scrape mobile version. No theme shenanigans... but also no photosets. Sigh. 
		if( pagenum == 1 && options_map.usemobile ) { siteurl = site_and_tags + "/mobile"; } 		// Hacky fix for redirect from example.tumblr.com/page/1/anything -> example.tumblr.com
		fetch( siteurl, { credentials: 'include' } ).then( response => response.text() ).then( text => {
			document.getElementById( pagenum ).innerHTML += "<b>fetched</b><br>" 		// Immediately indicate the fetch happened. 
				+ "<a href='" + siteurl + "'>" + siteurl + "</a><br>"; 		// Link to page. Useful for viewing things in-situ... and debugging. 
			// For some asinine reason, 'return url_array' causes 'Permission denied to access property "then".' So fake it with ugly nesting. 
			soft_scrape_page_promise( text ) 		 		
				.then( url_array => {
					var div_digest = ""; 		// Instead of updating each div's HTML for every image, we'll lump it into one string and update the page once per div.

					var video_array = new Array;
					var outlink_array = new Array;
					var inlink_array = new Array;

					url_array.forEach( (value,index,array) => { 		// Shift videos and links to separate arrays, blank out those URLs in url_array
						if( value.indexOf( '#video' ) > 0 ) 	{ video_array.push( value ); array[index] = '' } 
						if( value.indexOf( '#offsite' ) > 0 ) 	{ outlink_array.push( value ); array[index] = '' } 
						if( value.indexOf( '#local' ) > 0 ) 	{ inlink_array.push( value ); array[index] = '' } 
					} );
					url_array = url_array.filter( url => url === "" ? false : true ); 		// Remove empty elements from url_array

					// Display video links, if there are any
					video_array.forEach( value => {div_digest += "Video: <a href='" + value + "'>" + value + "</a><br>  "; } ) 

					// Display page links if the ?showlinks flag is enabled 
					if( options_map.showlinks ) { 
						div_digest += "Outgoing links: "; 
						outlink_array.forEach( (value,index) => { div_digest += "<a href='" + value.replace('#offsite#link', '') + "'>O" + (index+1) + "</a> " } );
						div_digest += "<br>" + "Same-Tumblr links: ";
						inlink_array.forEach( (value,index) => { div_digest += "<a href='" + value.replace('#local#link', '') + "'>T" + (index+1) + "</a> " } );
						div_digest += "<br>";
					}

					// Embed high-res images to be seen, clicked, and saved
					url_array.forEach( image_url => { 
						// Embed images (linked to themselves) and link to photosets
						if( image_url.indexOf( "#photoset#" ) > 0 ) { 		// Before the first image in a photoset, print the photoset link.
							var photoset_url = image_url.split( "#" ).pop(); 		
							// URL is like tumblr.com/image#photoset#http://tumblr.com/photoset_iframe - separate past last hash... t.
							div_digest += " <a href='" + photoset_url + "'>Set:</a>"; 
						}
						div_digest += "<a id='" + encodeURI( image_url ) + "' target='_blank' href='" + image_url + "'>" 	+ "<img alt='(Waiting for image)' onerror='" + error_function( image_url ) + "' src='" + image_url + "'></a>  ";
//						div_digest += "<a id='" + encodeURI( image_url ) + "' target='_blank' href='" + image_url + "'>" 	+ "<img alt='(Waiting for image)' src='" + image_url + "'></a>  ";  
					} )

					div_digest += "<br><a href='" + siteurl + "'>(End of " + siteurl + ")</a>";		// Another link to the page, because I'm tired of scrolling back up. 
					document.getElementById( pagenum ).innerHTML += div_digest; 
				} ) // End of 'then( url_array => { } )'
			} ) // End of 'then( text => { } )'
	} ) // End of 'pages.map( pagenum => { } )' 
}









// ------------------------------------ Whole-site scraper based on post-by-post method ------------------------------------ //








// The use of sitemap.xml could be transformative to this script, or even split off into another script entirely. It finally allows a standard theme! 
	// Just do it. Entries look like:
	// <url><loc>http://leopha.tumblr.com/post/57593339717/hello-a-friend-found-my-blog-somehow-so-url</loc><lastmod>2013-08-07T06:42:58Z</lastmod></url></urlset>
	// So split on <url><loc> and terminate at </loc>. Or see if JS has native XML parsing, for honest object-orientation instead of banging at text files. 
	// Grab normally for now, I guess, just to get it implemented. Make ?usemobile work. Worry about /embed stuff later. 
	// Check if dashboard-only blogs have xml files exposed. 
	// While we're at it, check that the very first <loc> for sitemap2 doesn't point to a /page/n URL. Not useful, just interesting. 

// Simplicate for now. Have a link to grab an individual XML file, get everything inside it, then add images (and tags?) from each page. 
	// For an automatic 'get the whole damn site' mode, maybe just call each XML file sequentially. Not a for() loop - have each page maybe call the next page when finished. 
	// Oh right, these XML files are like 500 posts each. Not much different from grabbing 50 pages at a time - and we currently grab 25 at once - but significant.
	// If I have to pass the /post URL list to a serial function in order to rate-limit this, I might as well grab all XML files and fill the list -once.- 
	// Count down? So e.g. scrape(x) calls scrape(x-1). It's up to the root function to call the high value initially. 

// Single-sitemap pages (e.g. scraping and displaying sitemap2) will need prev/next controls (e.g. to sitemap1 & sitemap3). 
// These will be chronological. That's fine, just mention it somewhere. 

// I'm almost loathe to publicize this. It's not faster or more reliable than the monolithic scraper. It's not a better image-browsing method, in my opinion.
// It's only really useful for people who think a blank theme is the same as erasing their blog... and the less those jerks know, the better. Don't delete art. 
// I guess it'll be better when I list tags with each post, but at present they're strongly filtered out. 

// Wait, wtf? I ripped sometipsygnostalgic as a test cast (98 sitemaps, ~50K posts) and the last page scraped is from years ago. 
// None of the sitemaps past sitemap51 exist. So the first 51 are present - the rest 404. (To her theme, not as a generic tumblr error with animated background.) 
// It's not a URL-generation error; the same thing happens copy-pasting the XML URLs straight from sitemap.xml. 
// Just... fuck. Throw a warning for now, see if there's anything to be done later. 
// If /mobile versions of /post URLs unexpectedly had proper [video] references... do they contain previous/next post info? Doesn't look like it. Nuts. 

// In addition to tags, this should maybe list sources and 'via' links. It's partially archival, after all. 
// Sort of have to conditionally unfilter /tagged links from both duplicate-remover and whichever function removes tumblr guff. 'if !xml remove tagged.' bluh. 
// Better idea: send the list to a 'return only tags' filter first, non-destructively. Then filter them out. 

// I guess this is what I'd modify to get multi-tag searches - like homestuck+hs+hamsteak. Scrape by /tagged and /page, populate post_urls, de-dupe and sort. 
	// Of course then I'd want it to 'just work' with the existing post-by-post image browser, which... probably won't happen. I'm loathe to recreate it exactly or inexactly. 

// This needs some method to find /post numbers from /page and /tagged pages. 
	// Maybe trigger a later post-by-post function from a modified scrapewholesite approach? Code quality is a non-issue right now. 
	// Basically have to fill some array post_urls with /post URLs (strings) and then call scrape_post_urls(). Oh, that array is global. This was already a janky first pass. 
	// Instead of XML files, use /page or /page + /mobile... pages. 
	// Like 'if /tagged/something then grab all links and filter for indexof /post.' 

// This is the landing function for this mode - it grabs sitemap.xml, parses options, sets any necessary variables, and invokes scrape_sitemap_x for sitemap1 and so on.
function scrape_sitemap() {
	document.title += ' sitemap'; 

	// Flow control.
	// Two hard problems. Structure now, names later. 
	if( options_map.sitemap ) { 
		mydiv.innerHTML += "<br>Completion: <div id='pagecounter'>0%</div><div id='afterpagecounter'></div><br>";		// This makes it easier to view progress,
		let final_sitemap = 0; 		// Default: no recursion. 'Else considered harmful.' 
		if( options_map.xmlcount ) { final_sitemap = options_map.xmlcount; } 		// If we're doing the whole site, indicate the highest sitemap and recurse until you get there. 
		scrape_sitemap_x( options_map.sitemap, final_sitemap ); 
	} else { 
		// Could create a div for this enumeration of sitemapX files, and always have it at the top. It'd get silly for tumblrs with like a hundred of them. 
		// Maybe at the bottom? 
		fetch( window.location.protocol + "//" + window.location.hostname + '/sitemap.xml', { credentials: 'include' } ) 		// Grab text-like file 
		.then( r => r.text() )
		.then( t => { 		// Process the text of sitemap.xml
			let sitemap_list = t.split( '<loc>' ); 
			sitemap_list.shift(); 		// Get rid of data before first location 
			sitemap_list = sitemap_list.map( e => e.split('</loc>')[0] ); 		// Terminate each entry 
			sitemap_list = sitemap_list.filter( e => { return e.indexOf( '/sitemap' ) > 0 && e.indexOf( '.xml' ) > 0; } ); 		// Remove everything but 'sitemapX.xml' links 

			mydiv.innerHTML += '<br>' + window.location.hostname + ' has ' + sitemap_list.length + ' sitemap XML file'; 
			if( sitemap_list.length > 1 ) { mydiv.innerHTML += 's'; } 		// Pluralization! Not sexy, just functional. 
	//		mydiv.innerHTML += ' (Up to ' + sitemap_list.length * 500 + ' posts.) <br>'
			mydiv.innerHTML += '. (' + Math.ceil(sitemap_list.length  / 2) + ',000 posts or fewer.)<br><br>' 		// Kludge math, but I pefer this presentation. 
			if( sitemap_list.length > 50 ) { mydiv.innerHTML += 'Sitemaps past 50 may not work.<br><br>'; } 		// Pluralization! Not sexy, just functional. 
			// List everything: <Links only>
			// List sitemap1: <Links only> or <Links & thumbnails> 
			// List sitemap2: <Links only> or <Links & thumbnails> etc. 
			mydiv.innerHTML += 'List everything: ' + '<a href=' + options_url( { sitemap:1, xmlcount:sitemap_list.length } ) + '>Links only</a> - <a href=' + options_url( { sitemap:1, xmlcount:sitemap_list.length, story:true, usemobile:true } ) + '>Links and text (stories)</a><br><br>'; 
			if( options_map.find && options_map.lastpage ) {
				mydiv.innerHTML += 'List just this tag (' + options_map.find + '): ' + '<a href=' + options_url( { sitemap:1, xmlcount:options_map.lastpage, tagscrape:true } ) + '>Links only</a> - <a href=' + options_url( { sitemap:1, xmlcount:options_map.lastpage, story:true, usemobile:true, tagscrape:true } ) + '>Links and text (stories)</a><br><br>'; 
			} 
//	mydiv.innerHTML += "<br><h1><a id='browse10' href='" + options_url( {scrapemode:'pagebrowser', thumbnails:true} ) +"'>Browse images (10 pages at once)</a> </h1>"; 
			for( n = 1; n <= sitemap_list.length; n++ ) { 
				let text_link = 		options_url( { sitemap:n } ); 
				let images_link = 	options_url( { sitemap:n, thumbnails:'xml' } ); 
				let story_link = 		options_url( { sitemap:n, story:true, usemobile:true } ); 
				mydiv.innerHTML += 'List sitemap' + n + ': ' + '<a href=' + text_link + '>Links only</a> <a href=' + images_link + '>Links & thumbnails</a> <a href=' + story_link + '>Links & text</a><br>' 
			} 
		} )
	}
}

// Text-based scrape mode for sitemap1.xml, sitemap2.xml, etc. 
function scrape_sitemap_x( sitemap_x, final_sitemap ) { 
	document.title = window.location.hostname + ' - sitemap ' + sitemap_x; 		// We lose the original tumblr's title, but whatever. 
	if( sitemap_x == final_sitemap ) { document.title = window.location.hostname + ' - sitemap complete'; } 
	if( options_map.story ) { document.title += ' with stories'; } 

	// Last-minute kludge: if options_map.tagscrape, use /page instead of any xml, get /post URLs matching this domain. 
		// Only whole-hog for now - sitemap_x might allow like ten or a hundred pages at once, later.

	var base_site = window.location.protocol + "//" + window.location.hostname; 		// e.g. http://example.tumblr.com  
	var sitemap_url = base_site + '/sitemap' + sitemap_x + '.xml'; 

	if( options_map.tagscrape ) {
		sitemap_url = base_site + options_map.find + '/page/' + sitemap_x; 
		 document.title += ' ' + options_map.find.replace( /\//g, ' '); 		// E.g. 'tagged my-stuff'. 
	}

	mydiv.innerHTML += "Finding posts from " + sitemap_url + ".<br>"; 
	fetch( sitemap_url, { credentials: 'include' } ) 		// Grab text-like file 
	.then( r => r.text() ) 		// Get txt from HTTP response
	.then( t => { 		// Process text to extract links
		let location_list = t.split( '<loc>' ); 		// Break at each location declaration
		location_list.shift(); location_list.shift(); 		// Ditch first two elements. First is preamble, second is base URL (e.g. example.tumblr.com). 
		location_list = location_list.map( e => e.split( '</loc>' )[0] ); 		// Terminate each entry at location close-tag 

		if( options_map.tagscrape ) { 
			// Fill location_list with URLs on this page, matching this domain, containing e.g. /post/12345. 
			// Trailing slash not guaranteed. Quote terminator unknown. Fuck it, just assume it's all doublequotes. 
			location_list = t.split( 'href="' ); 
			location_list.shift(); 
			location_list = location_list.map( e => e.split( '"' )[0] ); 		// Terminate each entry at location close-tag 
			location_list = location_list.filter( u => u.indexOf( window.location.hostname + '/post' ) > -1 ); 
			location_list = location_list.filter( u => u.indexOf( '/embed' ) < 0 ); 
			location_list = location_list.filter( u => u.indexOf( '#notes' ) < 0 ); 		// I fucking hate this website. 
			// https://www.facebook.com/sharer/sharer.php?u=https://shamserg.tumblr.com/post/55985088311/knight-of-vengence-by-shamserg
			// I fucking hate the web. 
			location_list = location_list.map( e => window.location.protocol + '//' + window.location.hostname + e.split( window.location.hostname )[1] ); 
//			location_list = location_list.filter( u => u.indexOf( '?ezastumblrscrape' ) <= 0 ); 		// Exclude links to this script. (Nope, apparently that's introduced elsewhere. WTF.) 
			// Christ, https versus http bullshit AGAIN. Just get it done.
//			location_list = location_list.map( e => window.location.protocol + '//' + e.split( '//' )[1] ); 		// http -> https, or https -> http, as needed. 
			// I don't think I ever remove duplicates. On pages with lots of silly "sharing" bullshit, this might grab pages multiple times. Not sure I care. Extreme go horse. 
		} 

		// location_list should now contain a bunch of /post/12345 URLs. 
		if( options_map.usemobile ) { location_list = location_list.map( (url) => { return url + '/mobile' } ); } 
		console.log( location_list ); 
//		return location_list; 		// Promises are still weird. (JS's null-by-default is also weird.) 
		post_urls = post_urls.concat( location_list ); 		// Append these URLs to the global array of /post URLs. (It's not like Array.push() because fuck you.) 
//		console.log( post_urls ); 
		// It seems like bad practice to recurse from here, but for my weird needs it seems to make sense. 
		if( sitemap_x < final_sitemap ) { 
			scrape_sitemap_x( sitemap_x + 1, final_sitemap ); 		// If there's more to be done, recurse.
		} else { 
			scrape_post_urls(); 		// If not, fetch & display all these /post URLs. 
		} 
	} ) 

}

// Fetch all the URLs in the global array of post_urls, then display contents as relevant 
function scrape_post_urls() {
	// Had to re-check how I did rate-limited fetches for the monolithic main scraper. It involved safe levels of 'wtf.' 
	// Basically I put n URLs in an array, or an array of fetch promises I guess, then use promise.all to wait until they're all fetched. Repeat as necessary. 
	// Can I use web workers? E.g. instantiate 25 parallel scripts that each shift from post_urls and spit back HTML. Ech. External JS is encouraged and DOM access is limited.
	// Screw it, batch parallelism will suffice. 
	// Can I define a variable as a function? I mean, can I go var foobar = function_name, so I can later invoke foobar() and get function_name()?
		// If so: I can pick a callback before all of this, fill a global array with e.g. [url, contents] stuff, and jump to a text- or image-based display when that's finished. 
		// Otherwise I think I'm gonna be copying myself a lot in order to display text xor images. 

	// Create divs for each post? This is going to be a lot of divs. 
	// 1000 is fine for the traditional monolithic scrape mode, but that covers ten or twenty times as many posts as a single sitemapX.xml. 
	// And it probably doesn't belong right here. 
	// Fuck it, nobody's using this unless they read the source code. Gotta build something twice to build it right.

	// Create divs for each post, so they can be fetched and inserted asynchronously
	post_urls.forEach( (url) => {
		let new_div = document.createElement( 'div' ); 
		new_div.id = '' + url; 
		document.body.appendChild( new_div ); 
	} )

	// Something in here is throwing "Promise rejection value is a non-unwrappable cross-compartment wrapper." I don't even know what the fuck. 
	// Apparently Mozilla doesn't know what the fuck either,  because all discussion of this seems to be 'well it ought to throw a better error.' 
	// Okay somehow this happens even with everything past 'console.log( order_array );' commented out, so it's not actually a wrong thing involving the Promise nest. Huh. 
	// It's the for loop. 
	// Is it because we try using location_list.length? Even though it's a global array and not a Promise? And console.log() has no problem... oh, console.log quietly chokes.
	// I'm trying to grab a sitemap2 that doesn't exist because I was being clever in opposite directions. 
	// Nope, still shits the bed, same incomprehensible error. Console.log shows location_list has... oh goddammit. location_list was the local variable; post_urls is the global. 
	// Also array.concat doesn't work for some reason. post_urls still has zero elements. 
	// Oh for fuck's sake it has no side effects! You have to do thing=thing.concat()! God damn the inconsistent behavior versus array.push! 

	// We now get as far as chain=chain.then and the first console.log('foobar'), but it happens exactly once. 
	// Splicing appears to happen correctly. 
	// We enter responses.map, and the [body of post, post url] array is passed correctly and contains right-looking data. 
	// Ah. links_from_text is not a function. It's links_from_page. Firefox, that's a dead simple error - say something. 

	// Thumbnails? True thumbnails, and only for tumblr images with _100 available. Still inadvisable for the whole damn tumblr at once. 
	// So limit that to single-sitemap modes (like linking out to 1000-pages-at-once monolithic scrapes) and call it good. 500 thumbnails is what /archive does anyway. 

	// Terrible fix for unique tags: can I edit CSS from within the page? Can I count inside that?
	// I'd be trying to do something like <tag id='my_art>my_art</tag> and increment some tags['my_art'] value - then only show <tag> with a corresponding value over one. 

	// Utter hack: I need to get the text of each /post to go above or below it. The key is options_map( 'story' ) == true. 
	// Still pass it as a "link" string. Prepend with \n or something, a non-URL-like control character, which we're okay to print accidentally. Don't link that "link." 
	// /mobile might work? Photosets show up as '[video]', but some text is in the HTML. Dunno if it's everything. 
		// Oh god they're suddenly fucking with mobile. 
		// In /mobile, get everything from '</h1>' to '<div id="navigation">'. 

	var simultaneous_fetches = 25;
	var order_array = new Array; 		// No reason to show immediate results, so fill with n, n, n until the sum matches the number of URLs. 
	for( x = post_urls.length; x > simultaneous_fetches; x -= simultaneous_fetches ) { 
		order_array.push( simultaneous_fetches ); 
	}

	if( x != 0 ) { order_array.push( x ); } 		// Any remainder from counting-down for() loop becomes the last element

//	console.log( order_array ); 
//	order_array = [5,5]; 		// Debug - limits fetches to just a few posts 

	var chain = Promise.resolve(0); 		// Empty promise so we can use "then" 

	order_array.forEach( (how_many, which_entry) => {
		chain = chain.then( s => { 
//			console.log( 'foobar' ); 
			var subarray = post_urls.splice( 0, how_many ); 		// Shift some number of elements into a separate array, for partial array.map
//			console.log( 'foobar2', subarray ); 
			return Promise.all( subarray.map( post_url => 
				Promise.all( [ fetch( post_url, { credentials: 'include' } ).then( s => s.text() ), post_url ] ) 		// Return [body of post, post URL] 
			) ) 
		} )
		. then( responses => responses.map( s => { 
//				console.log( 'foobar3', s ); 
				var post_url = s[1]; 
				// var url_array = soft_scrape_page_promise( s[0] ) 		// Surprise, this is a promise now 
				// Wait, do I need it to be a promise? Because otherwise I might prefer to use links_from_text(). 
				var sublinks = links_from_page( s[0] ); 

				let tag_links = sublinks.filter( return_tags ); 		// Copy tags into some new array. No changes to other filters required! (Test on eixn, ~2000 pages.) 
				tag_links = tag_links.filter( u => u.indexOf( '?ezastumblrscrape' ) < 0 ); 

				// Do we want to filter down to images for this scrape function? It's not necessarily displaying them. 
				// Arg, you almost have to, to filter out the links to tumblrs that reblogged each post. 
				sublinks = sublinks.filter( s => { return s.indexOf( '.jpg' ) > 0 || s.indexOf( '.jpeg' ) > 0 || s.indexOf( '.png' ) > 0 || s.indexOf( '.gif' ) > 0; } ); 

				sublinks = sublinks.filter( tumblr_blacklist_filter ); 			// Remove avatars and crap
				sublinks = sublinks.map( image_standardizer ); 	// Clean up semi-dupes (e.g. same image in different sizes -> same URL) 
				sublinks = sublinks.filter( novelty_filter ); 				// Global duplicate remover

//				sublinks = sublinks.concat( tag_links ); 		// Add tags back in. Not ideal, but should be functional. 

				if( options_map.story ) { 
					// Janky last-ditch text ripping, because Tumblr is killing NSFW sites:
					// Assume we're using /mobile. Grab everything between the end of the blog name and the standard navigation div. 
	//				let story_start = s[0].indexOf( '</h1>' ) + 5;
					let story_start = s[0].indexOf( '<p>' ) + 3;  		// Better to skip ahead a bit instead of showing the <h2> date 
					let story_end = s[0].indexOf( '<div id="navigation">' ); 
					let story = s[0].substring( story_start, story_end ); 
					// Problem: images are still showing up. We sort of don't want that. Clunk. 
					story = story.replace( /<img/g, '<notimg' );  // Clunk clunk clunk get it done. It is extreme go horse time. 
					// I should really convert inline images to links. 
					// Embedded videos what the fuck? 
					story = story.replace( /<iframe/g, '<notiframe' );  // Clunk clunk clunk! 
					story = story.replace( /<style/g, '<notstyle' );  // CLUNK CLUNK. 
					//story = story.replace( /<span/g, '<notspan' );  // Clunk?		// Oh, no. It's the /embed crap. 
					sublinks.push( story ); 		// This is just an array of strings, right? 
	//				console.log( story ); 		// DEBUG 
				}

				// Oh right, this doesn't really return anything. We arrange and insert HTML from here. 
				var bulk_string = "<br><a href='" + post_url + "'>" + post_url + "</a><br>"; 		// A digest, so we can update innerHTML just once per div
				sublinks.forEach( (link) => { 
					let contents = link; 
					if( options_map.thumbnails == 'xml' && link.indexOf( '_1280' ) > -1 ) { 		// If we're showing thumbnails and this image can be resized, do, then show it
						let img = link.replace( '_1280', '_100' ); 
						contents = '<img src=' + img + '>' + link; 		// <img> deserves a class, for consistent scale. 
					}
					this_link = '<a href ="' + link + '">' + contents + '</a><br>'; 

					// How is this not what's causing the CSS bleed? 
//					if( link.substring(0) == '\n' ) { 		// If this is text
					if( link.indexOf( '\n' ) > -1 ) { 		// If this is text
						this_link = link; 
					}

					bulk_string += this_link; 
				} )

				var tag_string = ""; 

				tag_links.forEach( (link) => {
//					let tag = link.split( '/tagged/' )[1]
					tag_string += '#' + link.split( '/tagged/' )[1] + ' '; 
				} )
				bulk_string += tag_string + "<br>"; 

				// Tags should be added here. If we have them. 
//				console.log( bulk_string ); 
				document.getElementById( '' + post_url ).innerHTML = bulk_string; 		// Yeeeah, I should probably create these div IDs before this happens. 
				// And here's where I'd increase the page counter, if we had one. 
				// Let's use order_array to judge how done we are - and make it a percent, not a page count. Use order_array.length and pretend the last element's the same size. 
//				document.getElementById( 'pagecounter' ).innerHTML = '%'; 
				// No wait, we can't do it here. This is per-post, not per-page. Or... we could do it real half-assed.
			} )
		)
		.then( s => { 		// I don't think we take any actual data here. This just fires once per 'responses' group, so we can indicate page count etc. 
			let completion = Math.ceil( 100 * (which_entry+1) / order_array.length ); 		// Zero-ordinal index to percentage. Blugh. 
			document.getElementById( 'pagecounter' ).innerHTML = '' + completion + '%'; 
		} ) 
	} )

	// "Promises allow a flat execution pattern!" Fuck you, you liars. Look at that rat's nest of alternating braces. 
	// If you've done all the work in spaghetti functions somewhere else, maybe it's fine, but if you want code to happen where it fucking starts, anonymous functions SUCK. 

} 









// ------------------------------------ Post-by-post scraper with embedded images ------------------------------------ //









// Scrape each page for /post/ links, scrape each /post/ for content, display in-order with less callback hell
// New layout & new scrape method - not required to be compatible with previous functions 
function new_embedded_display() {
	if( isNaN( parseInt( options_map.startpage ) ) || options_map.startpage <= 1 ) { options_map.startpage = 1; } 

	mydiv.innerHTML += "<center>" + html_previous_next_navigation() + "<br><br>" + html_page_count_navigation() + "</center><br>"; 
	document.getElementById("bottom_controls_div").innerHTML += "<center>" + html_page_count_navigation() + "<br><br>" + html_previous_next_navigation() + "</center>"; 

	// Links out from this mode - scrapewholesite, original mode, maybe other crap 
	//mydiv.innerHTML += "This mode is under development and subject to change.";		// No longer true. It's basically feature-complete. 
//	mydiv.innerHTML += " - <a href=" + options_url( { everypost:false } ) + ">Return to original image browser</a>" + "<br>" + "<br>"; 
	mydiv.innerHTML += "<br>" + html_ezastumblrscrape_options() + "<br><br>"; 

	// "Pages 1 to 10 (of 100) from http://example.tumblr.com" 
	mydiv.innerHTML += "Pages " + options_map.startpage + " to " + (options_map.startpage + options_map.pagesatonce - 1); 
	if( !isNaN(options_map.lastpage) ) { mydiv.innerHTML += " (of " + options_map.lastpage + ")"; } 
	mydiv.innerHTML += " from " + site_and_tags + "<br>"; 

	mydiv.innerHTML += image_size_options() + "<br><br>"; 
	mydiv.innerHTML += image_resolution_options() + "<br><br>"; 

	// Messy inline function for toggling page breaks - they're optional because we have post permalinks now 
	mydiv.innerHTML += "<a href='javascript: void(0);' class='interactive'; onclick=\"(function(o){ \
		let mydiv = document.getElementById('maindiv'); \
		if( mydiv.className == '' ) { mydiv.className = 'showpagelinks'; } \
		else { mydiv.className = ''; } \
		})(this)\">Toggle page breaks</a><br><br>";

	mydiv.innerHTML += "<span id='0'></span>"; 		// Empty span for things to be placed after. 
	posts_placed.push( 0 ); 		// Because fuck special cases.

	// Scrape some pages
	for( let x = options_map.startpage; x < options_map.startpage + options_map.pagesatonce; x++ ) {
		fetch( site_and_tags + "/page/" + x, { credentials: 'include' }  ).then( r => r.text() ).then( text => {
			scrape_by_posts( text, x ); 
		} )
	}
}



// Take the HTML from a /page, fetch the /post links, display images 
// Probably ought to be despaghettified and combined with the above function, but I was fighting callback hell -hard- after the last major version
// Alternately, split it even further and do some .then( do_this ).then( do_that ) kinda stuff above. 
function scrape_by_posts( html_copy, page_number ) {
//	console.log( page_dupe_hash ); 		// DEBUG 
	let posts = links_from_page( html_copy ); 		// Get links on page
	posts = posts.filter( link => { return link.indexOf( '/post/' ) > 0 && link.indexOf( '/photoset' ) < 0; } ); 		// Keep /post links but not photoset iframes 
	posts = posts.map( link => { return link.replace( '#notes', '' ); } ); 		// post/1234 is the same as /post/1234#notes 
	posts = posts.filter( link => link.indexOf( window.location.host ) > 0 ); 		// Same-origin filter. Not necessary, but it unclutters the console. Fuckin' CORS. 
	if( page_number != 1 ) { posts = posts.filter( novelty_filter ); }		// Attempt to remove posts linked on every page, e.g. commission info. Suffers a race condition. 
	posts = remove_duplicates( posts ); 		// De-dupe 
	// 'posts' now contains an array of /post URLs 

	// Display link and linebreak before first post on this page
	let first_id = posts.map( u => parseInt( u.split( '/' )[4] ) ).sort(  ).pop(); 		// Grab ID from its place in each URL, sort accordingly, take the top one
	let page_link = "<span class='pagelink'><hr><a target='_blank' href='" + site_and_tags + "/page/" + page_number + "'> Page " + page_number + "</a>"; 
	if( posts.length == 0 ) { first_id = 1; page_link += " - No images found."; } 		// Handle empty pages with dummy content. Out of order, but whatever.
	page_link += "<br><br></span>"; 
	display_post( page_link, first_id + 0.5 ); 		// +/- on the ID will change with /chrono, once that matters 

	posts.map( link => { 

		fetch( link, { credentials: 'include' } ).then( r => r.text() ).then( text => {

			let sublinks = links_from_page( text ); 
			sublinks = sublinks.filter( s => { return s.indexOf( '.jpg' ) > 0 || s.indexOf( '.jpeg' ) > 0 || s.indexOf( '.png' ) > 0 || s.indexOf( '.gif' ) > 0; } ); 
			sublinks = sublinks.filter( tumblr_blacklist_filter ); 			// Remove avatars and crap
			sublinks = sublinks.map( image_standardizer ); 	// Clean up semi-dupes (e.g. same image in different sizes -> same URL) 
			sublinks = sublinks.filter( novelty_filter ); 				// Global duplicate remover
			// Oh. Photosets sort of just... work? That might not be reliable; DownThemAll acts like it can't see the iframes on some themes. 
			// Yep, they're there. Gonna be hard to notice if/when they fail. Oh well, "not all images are guaranteed to appear." 
			// Videos will still be weird. (But it does grab their preview thumbnails.) 
			// Wait, can I filter reblogs here? E.g. with a ?noreblogs flag, and then checking if any given post has via/source links. Hmm. Might be easier in /mobile pages. 

			// Seem to get a lot of duplicate images? e.g. both 
			// https://media.tumblr.com/tumblr_m2gktkD7u31qdcy3io1_640.jpg and
			// https://media.tumblr.com/tumblr_m2gktkD7u31qdcy3io1_1280.jpg
			// Oh! Do I just not handle _640? 

			// Get ID from post URL, e.g. http//example.tumblr.com/post/12345/title => 12345 
			let post_id = parseInt( link.split( '/' )[4] ); 		// 12345 as a NUMBER, not a string, doofus

			if( sublinks.length > 0 ) { 		// If this post has images we're displaying - 
				let this_post = new String;
				sublinks.map( url => {
					this_post += '<span class="container">'; 
					this_post += '<a target="_blank" id="' + encodeURI( url ) + '" href="' + url + '">'; 
					this_post += '<img onerror=\'' + error_function( url ) + '\' id="'+url+'" src="' + url + '"></a>'; 
					this_post += '<span class="postlink"><a target="_blank" href="' + link + '">Permalink</a></span></span> '; 
				} )
				display_post( this_post, post_id ); 
			}

		} )

	} )
}

// Place content on page in descending order according to post ID number 
// Consider rejiggering the old scrape method to use this. Move to 'universal' section if so. Alter or spin off to link posts instead? 
// Turns out I never implemented ?chrono or ?reverse, so nevermind that for now.
// Remember to set options_map.chrono if ?find contains /chrono or whatever.  
function display_post( content, post_id ) { 
	let this_node = document.createElement( "span" );
	this_node.innerHTML = content; 
	this_node.id = post_id

	// Find lower-numbered node than post_id
	let target_id = posts_placed.filter( n => n <= post_id ).sort(  ).pop(); 		// Take the highest number less than (or equal to) post_id 
	if( options_map.find.indexOf( '/chrono' ) > 0 ) { 
		target_id = posts_placed.filter( n => n <= post_id ).sort(  ).shift();  		// Take the... fuck... lowest? What am I doing again?
		// Fuuuck, this is really inconsistent. Nevermind the looney-toons syntax I used here, =>n<=. 
		// Screw it, use the old scraper for now. 
	}
	let target_node = document.getElementById( target_id );

	// http://stackoverflow.com/questions/4793604/how-to-do-insert-after-in-javascript-without-using-a-library 
	target_node.parentNode.insertBefore( this_node, target_node ); 		// Insert our span after the lower-ID node
	posts_placed.push( post_id ); 		// Remember that we added this ID

	// No return value 
}

// Return ascending or descending order depending on "chrono" setting 
// function post_order_sort( a, b ) 









// ------------------------------------ Specific handling for www.tumblr.com (tag search, possibly dashboard) ------------------------------------ //









// URLs like https://www.tumblr.com/tagged/wooloo?before=1560014505 don't follow easy sequential pagination, so we have to (a) be linear or (b) guess. First whack is (a). 
// Tumblr dashboard is obvious standardized, so we can make assumptions about /post links in relation to images.
// We do not fetch any individual /posts. We can't. They're on different subdomains, and Tumblr CORS remains tight-assed. But we can link them like in the post scrape mode. 
	// Ooh, I could maybe get "behind the jump" content via /embed URLs. Posts here should contain the blog UUID and a post number. 

// Copied notes from above:
// https://www.tumblr.com/tagged/homestuck sort of works. Trouble is, pages go https://www.tumblr.com/tagged/homestuck?before=1558723097 - ugh. 
	// Next page is https://www.tumblr.com/tagged/homestuck?before=1558720051 - yeah this is a Unix timestamp. It's epoch time. 
	// https://www.tumblr.com/dashboard/2/185117399018 - but this one -is- based on /post numbers. Tumblr.com: given two choices, take all three. 
	// Being able to scrape and archive site-wide tags or your own dashboard would be useful. Dammit. 
		// Just roll another mode into this script. It's already a hot mess. The new code just won't run on individual blogs. 
// Okay, so dashboard and site-wide tag modes.
	// Dashboard post numbers don't have to be real post numbers. Tag-search timestamps obviously don't have to relate to real posts. 
	// We de-dupe, so overkill is fine... ish. Tumblr's touchy about "rate limit exceeded" these days. 
// Tag scrape would be suuuper useful if we can grab blog posts from www.tumblr.com. Like "hiveswapcomicscontest." 

// Still no luck on scraping dashboard-only blogs. Bluh. 
	// This is so aggravating. I can see the content, obviously. "View page source" just returns the dashboard source. But document.body.outerHTML contains the blog proper. 
// Consider /embed again: 
	// https://embed.tumblr.com/embed/post/4RwtewsxXp-k1ReCcdAgXg/185288559546?width=542&language=en_US&did=a5c973d33a43ace664986204d72d7739de31b614 
	// This works but provides no previous/next link. (We need the ID, but we can get it from www.tumblr.com, then redirect.) 
// Using DaveJaders for testing. https://davejaders.tumblr.com/archive does not redirect, so we can use that for same-origin fetches. Does /page stuff work? 
	// fetch( '/' ).then( r => r.text() ).then( t => document.body.outerHTML = t ) - CORS failure. "The Same Origin Policy disallows reading the remote resource at https://www.tumblr.com/login_required/davejaders . (Reason: CORS header ‘Access-Control-Allow-Origin’ missing)." Fuck me again, apparently. 
// Circle back to dashboard-only blogs by showing the actual content. Avoid clearing body.innerHTML, let Tumblr do its thing, interact with the sidebar deal. 

// I could still estimate page count, using a binary search. Assuming constant post rates. 
	// Or, get fancy, and estimate total posts from time between posts on sparse samples. Each fetch is ten posts in-order. 
	// It does say "No posts found." See https://www.tumblr.com/tagged/starlight-brigade?before=1503710342 
	// Between the ?before we fetch and the Next link, we have the span of time it took for ten posts to be made. I would completely exclude the "last" page, if found. 
	// I guess... integrate in blocks? E.g., given a data point, assume that rate continues to the next data point. Loose approximations are fine. 
	// E.g. push timestamp/rate pairs into an array, sort by timestamp, find delta between each timestamp, and multiply each rate by each period. One period is bidirectional. 
	// Lerping is not meaningfully harder. It's the above, plus each period times half the difference between the rates at either end of that period. 
// Specific date entry? input type = "date". 
	// Specific date entry is useful and simple enough to finish before posting a new version. Content estimation can wait. 
	// Oh, idiot: make the "on or before" display also the navigation mechanic. 

// We now go right here for any ?ezastumblrscrape URL on www.tumblr.com. Sloppy but functional. 
function scrape_www_tagged( ) {
	// ?find=/tagged/whatever is already populated by existing scrape links, but any ?key=value stuff gets lost. 
	// (If no ?before=timestamp, fetch first page. ?before=0 works. Some max_int would be preferable for sorting. No exceptions needed, then.) 
	is_first_page = false; 		// Implicit global, eat me. Clunk. 
	if( isNaN( options_map.before ) ) { 
		options_map.before = parseInt( Date.now() / 1000 );  		// Current Unix timestamp in seconds, not milliseconds. The year is not 51413.
		is_first_page = true; 
	}
	if( options_map.before > parseInt( Date.now() / 1000 ) ) { is_first_page = true; } 		// Also handle when going days or years "into the future." 

	// Standard-ish intial steps: clear page (handled), add controls, maybe change window title. 
	// We can't use truly standard prev/next navigation. Even officially, you only get "next" links. (Page count options should still work.) 
	let www_tagged_next = "<a class='www_next' href='" + options_url() + "'>Next >>></a>"; 
	let pages_www_tagged = ""; 
	pages_www_tagged += "<a class='www_next' href='" + options_url( {"pagesatonce": 10} ) + "'> 10</a>, "; 
	pages_www_tagged += "<a class='www_next' href='" + options_url( {"pagesatonce": 5} ) + "'> 5</a>, "; 
	pages_www_tagged += "<a class='www_next' href='" + options_url( {"pagesatonce": 1} ) + "'> 1</a> - >>>"; 

	// Periods of time, in seconds, because Unix epoch timestamps. 
	let one_day = 24 * 60 * 60; 
	let one_week = one_day * 7; 
	let one_year = one_day * 365.24; 		// There's no reason not to account for leap years. 

	let approximate_place = "Posts <a href='" + options_map.find + "'>" + options_map.find.split('/').join(' ') + "</a>"; 		// options_map.find as relative link. Convenient. 
	let date_string = new Date( options_map.before * 1000 ).toISOString().slice( 0, 10 ); 
	approximate_place += " on or before <input type='date' id='on_or_before' value='" + date_string + "'>"; 
	// How do I associate this entry with the button so that pressing Enter triggers the button? Eh, minor detail. 

	let coverage = "<span id='coverage'></span>"; 		// Started writing this string, then realized I can't know its value until all pages are fetched. 

	let time_www_tagged = "";		// Browse forward / backward by day / week / year. 
	time_www_tagged += "<a href='" + options_url( {"before": options_map.before + one_day} ) + "'><<< </a> One day";
	time_www_tagged += "<a href='" + options_url( {"before": options_map.before - one_day} ) + "'> >>></a> - ";
	time_www_tagged += "<a href='" + options_url( {"before": options_map.before + one_week} ) + "'><<< </a> One week";
	time_www_tagged += "<a href='" + options_url( {"before": options_map.before - one_week} ) + "'> >>></a> - ";
	time_www_tagged += "<a href='" + options_url( {"before": options_map.before + one_year} ) + "'><<< </a> One year";
	time_www_tagged += "<a href='" + options_url( {"before": options_map.before - one_year} ) + "'> >>></a>";

	// Change the displayed date and it'll go there. Play stupid games, win stupid prizes. 
	let jump_button_code = 'window.location +=  "?before=" + parseInt( new Date( document.getElementById( "on_or_before" ).value ).getTime() / 1000 );';  
	let date_jump = "<button onclick='" + jump_button_code + "'>Go</button>"; 

	mydiv.innerHTML += "<center>" + www_tagged_next + "<br><br>" + pages_www_tagged + "</center><br>";  
	mydiv.innerHTML += "<center>" + approximate_place + date_jump + "<br>" + coverage + "<br>" + time_www_tagged + "</center>"; 
	mydiv.innerHTML += "This mode is under development and subject to change.<br><br>"; 
	document.getElementById("bottom_controls_div").innerHTML += "<center>" + pages_www_tagged + "<br><br>" + www_tagged_next + "</center>";  

	mydiv.innerHTML += image_size_options() + "<br><br>"; 
	mydiv.innerHTML += image_resolution_options() + "<br><br>"; 

	// Fetch first page specified by ?before=timestamp. 
	let tagged_url = "" + options_map.find + "?before=" + options_map.before; 		// Relative URLs are guaranteed to be same-domain, even if they're garbage. 
	fetch( tagged_url, { credentials: 'include' }  ).then( r => r.text() ).then( text => { 
		display_www_tagged( text, options_map.before, options_map.pagesatonce ); 		// ... pagesatonce gets set in the pre-amble, right? It should have a default. 
 		// Optionally we could check here if options_map.before == 0 and instead send max_safe_integer. 
	} )
}

// Either we need a global variable for how many more pages per... page... or else I should pass a how_many_more value to this recursive function. 
function display_www_tagged( content, timestamp, pages_left ) {
	// First, grab the Next link - i.e. its ?before=timestamp value.
//	let next_timestamp_index = content.lastIndexOf( '?before=' ); 
//	let next_timestamp = content.substring( next_timestamp_index + 8, content.indexOf( next_timestamp_index, '"' ) ); 		// Untested
	let next_timestamp = content.split( '?before=' ).pop().split( '"' ).shift(); 		// The last "?before=12345'" string on the page. Clunky but tolerable. 
	next_timestamp = "" + parseInt( next_timestamp ); 		// Guarantee this is a string of a number. (NaN "works.") Pages past the end may return nonsense. 
	if( pages_left > 1 ) { 		// If we're displaying more pages then fetch that and recurse.
		let tagged_url = "" + options_map.find + "?before=" + next_timestamp; 		// Relative URLs are guaranteed to be same-domain, even if they're garbage. 
//		console.log( tagged_url ); 
		fetch( tagged_url, { credentials: 'include' }  ).then( r => r.text() ).then( text => { 
			display_www_tagged( text, next_timestamp, pages_left - 1 ); 
		} )
	} else { 		// Otherwise put that timestamp in our constructed Next link(s). 
		// I guess... get HTMLcollection of elements for "next" links, and change each one. 
		// Downside: links will only change once the last page is fetched. We could tack on a ?before for every fetch, but it would get silly. Right? 
		let next_links = Array.from( document.getElementsByClassName( 'www_next' ) ); 		// I'm not dealing with a live object unless I have to. 
		for( link of next_links ) { link.href += "?before=" + next_timestamp; } 

		// Oh right, and update the header to guesstimate what span of time we're looking at. 
		let coverage = document.getElementById( 'coverage' ); 
		coverage.innerHTML += "Displaying "; 
		if( is_first_page ) { coverage.innerHTML += "the most recent "; } 		// Condition is no longer sensible, but it's a good placeholder. 
		// We can safely treat ?before as the initial timestamp. Perfect accuracy is not important. 
		let time_covered = Math.abs( options_map.before - next_timestamp ); 		// Absolute so I don't care if I have it backwards. 
		if( time_covered > 48 * 60 * 60 ) { coverage.innerHTML += parseInt( time_covered / (24*60*60) ) + " days of posts"; } 		// Over two days? Display days.
		else if( time_covered > 2 * 60 * 60 ) { coverage.innerHTML += parseInt( time_covered / (60*60) ) + " hours of posts"; } 		// Over two hours? Display hours. 
		else if( time_covered > 2 * 60 ) { coverage.innerHTML += parseInt( time_covered / 60 ) + " minutes of posts"; } 		// Over two minutes? Display minutes.
		else { coverage.innerHTML += " several seconds of posts"; } 		// Otherwise just say it's a damn short time. 
		if( time_covered == 0 ) { coverage.innerHTML = "Displaying the first available posts"; } 		// Last page, earlier posts. No "next" page. 
	}

	// Insert div for this timestamp's page. 
	let new_div = document.createElement( 'span' ); 		// Span, because divs cause line breaks. Whoops. 
	new_div.id = "" + timestamp; 
	let target_node = document.getElementById( 'bottom_controls_div' ); 
	target_node.parentNode.insertBefore( new_div, target_node ); 		// Insert each page before the footer. 
	let div_html = ""; 

	// Separate page HTML by posts. 
	// At least the "li" elements aren't nested, so I can terminate the last one on "</li>". Or... all of them. 
	let posts = content.split( '<li class="post_container"' );  
	posts.shift(); 
	posts.push( posts.pop().split( '</li>' )[0] ); 		// Terminate last element at </li>. Again, not great code, but clunk clunk clunk get it done. 

	// For each post: 
	for( post of posts ) { 
		// Extract images from each post. 
		let links = links_from_page( post ); 
		links = links.map( image_standardizer ); 		// This goes before grabbing the permalink because /post URLs do get standardized. No &media guff. 
		let permalink = links.filter( s => s.indexOf( '.tumblr.com/post' ) > 0 )[0]; 		// This has to go before de-duping, or posts linking to posts can leave permalinks blank. 
		links = links.filter( novelty_filter );
		links = links.filter( tumblr_blacklist_filter ); 
//		document.body.innerHTML += links.join( "<br>" ) + "<br><br>"; 		// Debug 

		// Separate the images. 
		let images = links.filter( s => s.indexOf( 'media.tumblr.com' ) > 0 ); 		// Note: this will exclude external images, e.g. embedded Twitter stuff. 

		// If this post has images:
		if( images.length > 0 ) { 			// Build HTML xor insert div for each post, to display images. 
			// Get /post URL, including blog name etc. 
			//let permalink = links.filter( s => s.indexOf( '.tumblr.com/post' ) > 0 )[0]; 

			let post_html = ""; 	
			for( image of images ) { 
				post_html += '<span class="container">'; 
				post_html += '<a target="_blank" id="' + encodeURI( image ) + '" href="' + image + '">'; 
				post_html += '<img onerror=\'' + error_function( image ) + '\' id="' + image + '" src="' + image + '"></a>'; 
				post_html += '<span class="postlink"><a target="_blank" href="' + permalink + '">Permalink</a></span></span> '; 
			} 
			div_html += post_html; 
		}
	}

	// Insert accumulated HTML into this div. 
	new_div.innerHTML = div_html; 
}









// ------------------------------------ HTML-returning functions for duplication prevention ------------------------------------ //









// Return HTML for standard Previous / Next controls (<<< Previous - Next >>>)
function html_previous_next_navigation() {
	let prev_next_controls = "";  
	if( options_map.startpage > 1 ) { 
		prev_next_controls += "<a href='" + options_url( "startpage", options_map.startpage - options_map.pagesatonce ) + "'><<< Previous</a> - "; 
	} 
	prev_next_controls += "<a href='" + options_url( "startpage", options_map.startpage + options_map.pagesatonce ) + "'>Next >>></a>";

	return prev_next_controls; 
}

// Return HTML for pages-at-once versions of previous/next page navigation controls (<<<  10,  5,  1 -  1,  5,  10 >>>) 
function html_page_count_navigation() {
	let prev_next_controls = "";  
	if( options_map.startpage > 1 ) { 		// <<< 10, 5, 1 - 
		prev_next_controls += "<<< "; 
		prev_next_controls += "<a href='" + options_url( {"startpage": options_map.startpage - 1, "pagesatonce": 1} ) + "'> 1</a>, ";  
		prev_next_controls += "<a href='" + options_url( {"startpage": options_map.startpage - 5, "pagesatonce": 5} ) + "'> 5</a>, "; 
		prev_next_controls += "<a href='" + options_url( {"startpage": options_map.startpage - 10, "pagesatonce": 10} ) + "'> 10</a> - ";
	}
	prev_next_controls += "<a href='" + options_url( {"startpage": options_map.startpage + options_map.pagesatonce, "pagesatonce": 10} ) + "'> 10</a>, ";
	prev_next_controls += "<a href='" + options_url( {"startpage": options_map.startpage + options_map.pagesatonce, "pagesatonce": 5} ) + "'> 5</a>, ";  
	prev_next_controls += "<a href='" + options_url( {"startpage": options_map.startpage + options_map.pagesatonce, "pagesatonce": 1} ) + "'> 1</a> - ";  
	prev_next_controls += ">>>";  

	return prev_next_controls; 
}

// Return HTML for image-size options (changes via CSS or via URL parameters) 
// This used to work. It still works, in the new www mode I just wrote. What the fuck do I have to do for some goddamn onclick behavior? 
// "Content Security Policy: The page’s settings blocked the loading of a resource at self (“script-src https://lalilalup.tumblr.com https://assets.tumblr.com/pop/ 'nonce-OTA4NjViZmE2MzZkYTFjMjM1OGZkZGM1MzkwYWU4NTA='”)." What the fuck. 
// Jesus Christ, it might be yet again because of Tumblr's tightass settings: 
// https://stackoverflow.com/questions/37298608/content-security-policy-the-pages-settings-blocked-the-loading-of-a-resource
// The function this still works in is on a "not found" page. The places it will not work are /archive pages. 
// Yeah, on /mobile instead of /archive it works fine. Fuck you, Tumblr. 
// Jesus, that means even onError can't work. 
function image_size_options() {
	var html_string = "Immediate: \t"; 		// Change <body> class to instantly resize images, temporarily

	html_string += "<a class='interactive' href='javascript: void(0);' onclick=\"(function(o){ document.body.className = o; })('')\">Original image sizes</a> - "; 
	html_string += "<a class='interactive' href='javascript: void(0);' onclick=\"(function(o){ document.body.className = o; })('fixed-width')\">Snap columns</a> - "; 
	html_string += "<a class='interactive' href='javascript: void(0);' onclick=\"(function(o){ document.body.className = o; })('fixed-height')\">Snap rows</a> - "; 
	html_string += "<a class='interactive' href='javascript: void(0);' onclick=\"(function(o){ document.body.className = o; })('fit-width')\">Fit width</a> - "; 
	html_string += "<a class='interactive' href='javascript: void(0);' onclick=\"(function(o){ document.body.className = o; })('fit-height')\">Fit height</a> - "; 
	html_string += "<a class='interactive' href='javascript: void(0);' onclick=\"(function(o){ document.body.className = o; })('smart-fit')\">Fit both</a><br><br>"; 

	html_string += "Persistent: \t"; 		// Reload page with different image mode that will stick for previous/next pages
	html_string += "<a href=" + options_url( { thumbnails: "original" } ) + ">Original image sizes</a> - "; 		// This is the CSS default, so any other value works
	html_string += "<a href=" + options_url( { thumbnails: "fixed-width" } ) + ">Snap columns</a> - "; 
	html_string += "<a href=" + options_url( { thumbnails: "fixed-height" } ) + ">Snap rows</a> - "; 
	html_string += "<a href=" + options_url( { thumbnails: "fit-width" } ) + ">Fit width</a> - "; 
	html_string += "<a href=" + options_url( { thumbnails: "fit-height" } ) + ">Fit height</a> - "; 
	html_string += "<a href=" + options_url( { thumbnails: "smart-fit" } ) + ">Fit both</a>"; 
	return html_string; 
}

// Return HTML for links to ?maxres versions of the same page, e.g. "_raw" versus "_1280"
function image_resolution_options() {
	var html_string = "Maximum resolution: \t"; 
	html_string += "<a href=" + options_url( { maxres: "raw" } ) + ">Raw</a> - "; 		// I'm not 100% sure "_raw" works anymore, but the error function handles it, so whatever. 
	html_string += "<a href=" + options_url( { maxres: "1280" } ) + ">1280</a> - "; 	
	html_string += "<a href=" + options_url( { maxres: "500" } ) + ">500</a> - "; 	
	html_string += "<a href=" + options_url( { maxres: "400" } ) + ">400</a> - "; 	
	html_string += "<a href=" + options_url( { maxres: "250" } ) + ">250</a> - "; 	
	html_string += "<a href=" + options_url( { maxres: "100" } ) + ">100</a>"; 	
	return html_string; 
}

// Return links to other parts of Eza's Tumblr Scrape functionality, possibly excluding whatever you're currently doing 
// Switch to full-size images - Toggle image size - Show one page at once - Scrape whole Tumblr - (Experimental fetch-every-post image browser)
// This mode is under development and subject to change. - Return to original image browser
function html_ezastumblrscrape_options() {
	let html_string = ""; 

	// "You are browsing" text? Tell people where they are and what they're looking at. 
	html_string += "<a href='" + options_url( { scrapemode: "scrapewholesite" } ) + "'>Scrape whole Tumblr</a> - ";
	html_string += "<a href='" + options_url( { scrapemode: "pagebrowser" } ) + "'>Browse images</a> - "; 		// Default mode; so any value works
	html_string += "<a href='" + options_url( { scrapemode: "everypost" } ) + "'>(Experimental fetch-every-post image browser)</a> ";

	return html_string;
}

function error_function( url ) { 
	// This clunky <img onError> function looks for a lower-res image if the high-res version doesn't exist. 
	// Surprisingly, this does still matter. E.g. http://66.media.tumblr.com/ba99a55896a14a2e083cec076f159956/tumblr_inline_nyuc77wUR01ryfvr9_500.gif 
	// This might mismatch _100 images and _250 links because of that self-erasing clause... but it's super rare, so meh. 
	let on_error = 'if(this.src.indexOf("_raw")>0){this.src=this.src.replace("_raw","_1280").replace("//media","//66.media");}';		// Swap _raw for 1280, add CDN number 
	on_error += 'else if(this.src.indexOf("_1280")>0){this.src=this.src.replace("_1280","_500");}';		// Swap 1280 for 500
	on_error += 'else if(this.src.indexOf("_500")>0){this.src=this.src.replace("_500","_400");}';		// Or swap 500 for 400
	on_error += 'else if(this.src.indexOf("_400")>0){this.src=this.src.replace("_400","_250");}';		// Or swap 400 for 250
	on_error += 'else{this.src=this.src.replace("_250","_100");this.onerror=null;}';							// Or swap 250 for 100, then give up
	on_error += 'document.getElementById("' + encodeURI( url ) + '").href=this.src;'; 		// Link the image to itself, regardless of size

// 2020: This has to get more complicated, or at least more verbose, to support shitty new image URLs. 
// https://66.media.tumblr.com/eb7d40a8683e623f173f81fc253056dc/e4277131a4c174c7-a5/s1280x1920/5b3c54b53a09f91b08f9af7fb88e30a253f3c5db.jpg
//'s400x600'
//'s500x750'
//'s640x960'
//'s1280x1900'
// 's2048x3072' 
// This has to go in Image Glutton, because Tumblr only ever gets worse 

	// Start over. 
//	on_error = 'this.src = this.src.replace( "_250", "_100" ); '; 		// Unconditional, reverse ladder pattern. Don't Google that. I just made it up. 
	on_error = "let oldres = [ '_100', '_250', '_400', '_500', '_1280' ]; "; 		// Inverted quotes, Eh. 
	on_error += "let newres = [ 's400x600', 's500x750', 's640x960', 's1280x1920' ];"; 	// Should probably include slashes. Rare collisions. 
//	on_error += "for( let i = old.length-1; i > 1; i-- ) { this.src.replace( old[i], old[i-1] ); } "; 		// 1280 -> 500, 500 -> 400... shit.
	on_error += "for( let i = 1; i < oldres.length; i++ ) { this.src = this.src.replace( oldres[i+1], oldres[i] ); } "; // 250 -> 100, 400 -> 250, etc. One step per error.
	on_error += "for( let i = 1; i < newres.length; i++ ) { this.src = this.src.replace( newres[i+1], newres[i] ); } "; 
	on_error += 'document.getElementById("' + encodeURI( url ) + '").href=this.src;'; 		// Original quotes - be cautious when editing 

// God dammit. The links work, but the embedded images don't. 
// https://66.media.tumblr.com/eb7d40a8683e623f173f81fc253056dc/e4277131a4c174c7-a5/s1280x1920/ba0f0dd5d5e76726639782f57c4b732f156a5219.jpg
// https://66.media.tumblr.com/eb7d40a8683e623f173f81fc253056dc/e4277131a4c174c7-a5/s1280x1920/5b3c54b53a09f91b08f9af7fb88e30a253f3c5db.jpg

/*
x = 'https://66.media.tumblr.com/eb7d40a8683e623f173f81fc253056dc/e4277131a4c174c7-a5/s1280x1920/44b457df0d9826135b062563cd802afdfe496888.jpg'
newres = [ 's400x600', 's500x750', 's640x960', 's1280x1920' ];
for( let i = 1; i < newres.length; i++ ) { this.src.replace( newres[i+1], newres[i] ); }
*/

	return on_error; 
}









// ------------------------------------ Universal page-scraping function (and other helper functions) ------------------------------------ //









// Add URLs from a 'blank' page to page_dupe_hash (without just calling soft_scrape_page_promise and ignoring its results)
function exclude_content_example( url ) { 
	fetch( url, { credentials: 'include' } ).then( r => r.text() ).then( text => {
		let links = links_from_page( text );
		links = links.filter( novelty_filter ); 		// Novelty filter twice, because image_standardizer munges some /post URLs 
		links = links.map( image_standardizer ); 
		links = links.filter( novelty_filter ); 
	} )
	// No return value 
}

// Spaghetti to reduce redundancy: given a page's text, return a list of URLs. 
function links_from_page( html_copy ) {
	// Cut off the page at the "More you might like" / "Related posts" footer, on themes that have one
	html_copy = html_copy.split( '="related-posts' ).shift(); 

	let http_array = html_copy.split( /['="']http/ ); 		// Regex split on anything that looks like a source or href declaration 
	http_array.shift(); 		// Ditch first element, which is just <html><body> etc. 
	http_array = http_array.map( s => { 		// Theoretically parallel .map instead of maybe-linear .forEach or low-level for() loop 
		if( s.indexOf( "&" ) > -1 ) { s = htmlDecode( s ); } 		// Yes a fucking &quot should match a goddamn regex for terminating on quotes!
		s = s.split( /['<>"']/ )[0]; 		// Terminate each element (split on any terminator, take first subelement) 
		s = s.replace( /\\/g, '' ); 		// Remove escaping backslashes (e.g. http\/\/ -> http//) 
		if( s.indexOf( "%3A%2F%2F" ) > -1 ) { s = decodeURIComponent( s ); } 		// What is with all the http%3A%2F%2F URLs? 
//		s = s.split( '&quot;' )[0]; 		// Yes these count as doublequotes you stupid broken scripting language. 
		return "http" + s; 		// Oh yeah, add http back in (regex eats it) 
	} )		// http_array now contains an array of strings that should be URLs

	let post_array = html_copy.split( /['="']\/post/ ); 		// Regex split on anything that defines looks similar to a src="/post" link
	post_array.shift(); 		// Ditch first element, which is just <html><body> etc. 
	post_array = post_array.map( s => { 		// Theoretically parallel .map instead of maybe-linear .forEach or low-level for() loop 
		s = s.split( /['<>"']/ )[0]; 		// Terminate each element (split on any terminator, take first subelement) 
		return window.location.protocol + "//" + window.location.hostname + "/post" + s; 		// Oh yeah, add /post back in (regex eats it) 
	} )		// post_array now contains an array of strings that should be photoset URLs

	http_array = http_array.concat( post_array ); 		// Photosets are out of order again. Blar. 

	return http_array;
}

// Filter: Return false for typical Tumblr nonsense (JS, avatars, RSS, etc.)
function tumblr_blacklist_filter( url ) { 
	if( url.indexOf( "/reblog/" ) > 0 ||
		url.indexOf( "/tagged/" ) > 0 || 		// Might get removed so the script can track and report tag use. Stupid art tags like 'my-draws' or 'art-poop' are a pain to find. 
		url.indexOf( ".tumblr.com/avatar_" ) > 0  || 
		url.indexOf( ".tumblr.com/image/" ) > 0  || 
		url.indexOf( ".tumblr.com/rss" ) > 0   || 
		url.indexOf( "srvcs.tumblr.com" ) > 0  || 
		url.indexOf( "assets.tumblr.com" ) > 0  || 
		url.indexOf( "schema.org" ) > 0  || 
		url.indexOf( ".js" ) > 0  || 
		url.indexOf( ".css" ) > 0  ||
		url.indexOf( "twitter.com/intent" ) > 0  ||  		// Weirdly common now 
		url.indexOf( "tmblr.co/" ) > 0  ||
//https://66.media.tumblr.com/dea691ec31c9f719dd9b057d0c2be8c3/a533a91e352b4ac7-29/s16x16u_c1/f1781cbe6a6eb7417aa6c97e4286d46372e5ba37.jpg
		url.indexOf( "s16x16u" ) > 0 || 		// Avatars
		url.indexOf( "s64x64u" ) > 0 || 		// Avatars  
//https://66.media.tumblr.com/49fac1010ada367bcaee721ea49d0de5/3ddc8ebad67ae55a-ca/s64x64u_c1/592c805c7ce2c44cd389421697e0360b83d89f37.jpg
		url.indexOf( "u_c1/" ) > 0 || 		// Avatars
// https://66.media.tumblr.com/a4fe5011f9e07f95588c789128b60dca/603e412a60dc5227-4a/s64x64u_c1_f1/7b50fe887c756759973d442a89ef86ab1359f6e5.gif
		url.indexOf( "ezastumblrscrape" ) > 0 ) 		// Somehow this script is running on pages being fetched, inserting a link. Okay. Sure. 
	{ return false } else { return true }
}

// Return standard canonical URL for various resizes of Tumblr images - size of _1280, single CDN 
// 10/14 - ?usesmall seems to miss the CDN sometimes?
	// e.g. http://mooseman-draws.tumblr.com/archive?startpage=1?pagesatonce=5?thumbnails?ezastumblrscrape?scrapemode=everypost?lastpage=37?usesmall
	// https://66.media.tumblr.com/d970fff86185d6a51904e0047de6e764/tumblr_ookdvk7foy1tf83r7o1_400.png sometimes redirects to 78.media and _raw. What? 
	// Oh, probably not my script. I fucked with the Tumblr redirect script I use, but didn't handle the lack of CDN in _raw sizes. 
// 2020:  tumblr scrape: 
	// https://66.media.tumblr.com/b1515c45637955e1e52ec213944db662/08f2b2e47bbce703-70/s400x600/9077891dda98f127c040bd77581014ce4019fe75.gifv
		// obviously can go to .gif, but that .gif saves as tumblr_b1515c45637955e1e52ec213944db662_9077891d_400.gif. 
		// Also, bumping up to /s500x750 works, so I can probably still slam everything to maximum size via 'canonical' urls. 
function image_standardizer( url ) {
	// Some lower-size images are automatically resized. We'll change the URL to the maximum size just in case, and Tumblr will provide the highest resolution. 
	// Replace all resizes with _1280 versions. Nearly all _1280 URLs resolve to highest-resolution versions now, so we don't need to e.g. handle GIFs separately. 
	// Oh hey, Tumblr now has _raw for a no-bullshit as-large-as-possible setting. 
	// _raw only works without the CDN - so //media.tumblr yes, but //66.media.tumblr no. This complicates things. 
	// Does //media and _raw always work? No, of course not. So we still need on_error. 
//	url = url.replace( "_540.", "_1280." ).replace( "_500.", "_1280." ).replace( "_400.", "_1280." ).replace( "_250.", "_1280." ).replace( "_100.", "_1280." );  
	let maxres = "1280"; 		// It is increasingly unlikely that _raw still works. Reconsider CDN handling if that's the case. 
	if( options_map.maxres ) { maxres = options_map.maxres } 		// If it's set, use it. Should be _100, _250, whatever. ?usesmall should set it to _400. ?notraw, _1280. 
	maxres = "_" + maxres + "."; 		// Keep the URL options clean: "400" instead of "_400." etc. 
	url = url.replace( "_raw", maxres ).replace( "_1280.", maxres ).replace( "_640.", maxres ).replace( "_540.", maxres )
		.replace( "_500.", maxres ).replace( "_400.", maxres ).replace( "_250.", maxres ).replace( "_100.", maxres );  

	// henrythehangman.tumblr.com has doubled images from /image posts in ?scrapemode=everypost. Lots of _1280.jpg?.jpg nonsense.
	// Is that typical for tumblrs with this theme? It's one of those annoying magnifying-glass-on-hover deals. If it's just that one weird fetish site, remove this later.
	url = url.split('?')[0]; 		// Ditch anything past the first question mark, if one exists 
	url = url.split('&')[0]; 		// Ditch anything past the first ampersand, if one exists - e.g. speikobrarote.tumblr.com 
	if( url.indexOf( 'tumblr.com' ) > 0 ) { url = url.split( ' ' )[0]; } 		// Ditch anything past a trailing space, if one exists - e.g. cinnasmut.tumblr.com 

	// Standardize media subdomain / CDN subsubdomain, to prevent duplicates and fix _1280 vs _raw complications.
	if( url.indexOf( '.media.tumblr.com/' ) > 0 ) { 
		let url_parts = url.split( '/' )
		url_parts[2] = '66.media.tumblr.com';  		// This came first. Then //media.tumblr.com worked, even for _raw. Then _raw went away. Now it needs a CDN# again. Bluh.
//		url_parts[2] = 'media.tumblr.com'; 			// 2014: write a thing. 2016: comment out old thing, write new thing. 2018: uncomment old thing, comment new thing. This script.
		url = url_parts.join( '/' ).replace( 'http:', 'https:' ); 

		/* 			// Doesn't work - URLs resolve to HTML when clicked, but don't embed as images. Fuck this website. 
		// I need some other method for recognizing duplicates - the initial part of the URL is the same. 
		// Can HTML5 / CSS suppress certain elements if another element is present? 
			// E.g. #abcd1234#640 is display:none if #abcd1234#1280 exists. 
			// https://support.awesome-table.com/hc/en-us/articles/115001399529-Use-CSS-to-change-the-style-of-each-row-depending-on-the-content
			// The :empty pseudoselector works like :hover. And you can condition it on attributes, like .picture[src=""]:empty. 
			// The :empty + otherSelector{} syntax is weird, but it's CSS, so of course it's weird. 
			// Here we'd... generate a style element? Probably just insert a <style> block in innerHTML, honestly. Is inline an option? 
			// http://rafaelservantez.com/writings_tutorials_web_css_cond.html
			// Ech, + requires immediate adjacency. ~ selects within siblings, but only siblings "after" the first element. 

		// Late 2020: shitty new URLs don't contain _tumblr. 
		if( url.indexOf( '_tumblr' ) < 0 ) { 		// Should be !String.match, but Christ, one problem at a time. 
			let othermax = 's1280x1920'; 
			url = url.replace( 's400x600', othermax ).replace( 's500x750', othermax ).replace( 's640x960', othermax );
//'s400x600'
//'s500x750'
//'s640x960'
//'s1280x1920'
		} 
		*/

	}
/*		// ?notraw and ?usesmall are deprecated. Use ?maxres=1280 or ?maxres=400 instead. 
	// Change back to _1280 & CDN for ?scrapewholesite (which does no testing). _raw is unreliable. 
	if( options_map.scrapemode == 'scrapewholesite' || options_map.notraw ) { 
		url = url.replace( "_raw.", "_1280." ).replace( '//media', '//66.media' ); 
	}
	// Change to something smaller for quick browsing, like _500 or _250 
	// https://78.media.tumblr.com/166565b897f228352069b290067215c0/tumblr_oozoucFu0O1v1bsoxo2_raw.jpg etc don't work. What? 
	if( options_map.scrapemode != 'scrapewholesite' && options_map.usesmall ) { 		// Ignore this on scrapewholesite; that'd be a dumb side effect. 
		url = url.replace( "_raw.", "_400." ).replace( "_1280.", "_400." ).replace( '//media', '//66.media' ); 
	}
*/

	return url; 
}

// Remove duplicates from an array (from an iterable?) - returns de-duped array 
// Credit to http://stackoverflow.com/questions/9229645/remove-duplicates-from-javascript-array for hash-based string method 
function remove_duplicates( list ) {
	let seen = {}; 
	list = list.filter( function( item ) {
	    return seen.hasOwnProperty( item ) ? false : ( seen[ item ] = true );
	} );
	return list; 
}

// Filter: Return true ONCE for any given string. 
// Global duplicate remover - return false for items found in page_dupe_hash, otherwise add new items to it and return true
// Now also counts instances of each non-unique argument 
function novelty_filter( url ) {
	// return page_dupe_hash.hasOwnProperty( url ) ? false : ( page_dupe_hash[ url ] = true ); 
//	console.log( page_dupe_hash ); 		// DEBUG 
	url = url.toLowerCase(); 		// Debug-ish, mostly for "tag overview." URLs can be case-sensitive but collisions will be rare. "Not all images are guaranteed to appear." 
	if( page_dupe_hash.hasOwnProperty( url ) ) {
		page_dupe_hash[ url ] += 1;
		return false;
	} else {
		page_dupe_hash[ url ] = 1; 
		return true; 
	}
}

// Filter: Return true for any Tumblr /tagged URL.
// This is used separately from the other filters. Those go X=X.filter( remove stuff ), X=X.filter( remove other stuff ). This should run first, like Y=X.filter( return_tags ). 
function return_tags( url ) { 
//	return url.indexOf( '/tagged' ) > 0; 		// 'if( condition ) { return true } else { return false }' === return condition. 
	if( url.indexOf( '/tagged' ) > 0 ) { return true; } else { return false; } 
}

// Viewing a bare image, redirect to largest available image size.
// // e.g. https://66.media.tumblr.com/d020ac15b0e04ff9381f246ed08c9f05/tumblr_o2lok9yf8O1ub76b5o1_1280.jpg
// This is a sloppy knockoff of Tumblr Image Size 1.1, because that script doesn't affect //media.tumblr.com URLs. 
// Also, this is a distinct mode, not a helper function. Maybe just move it up to the start? 
function maximize_image_size() {
	if( window.location.href.indexOf( "_1280." ) > -1 ) { return false; } 		// If it's already max-size, die. 
	if( window.location.href.indexOf( "_raw." ) > -1 ) { return false; } 		// If it's already max-size, die. 
	// Nope, still broken. E.g. https://78.media.tumblr.com/tumblr_mequmpAxgp1r9tb2u.gif 
	// Needs a positive test for all the replacements... or a check if this URL is different from the changed URL. 

	// This should probably use image_standardizer to avoid duplicate code. 
	let replacement = image_standardizer( window.location.href ); 
	if( window.location.href != replacement ) { window.location.href = replacement; } 
/*
	let maxres = "_1280.";
//	maxres = "_" + maxres + "."; 		// Keep the URL options clean: "400" instead of "_400." etc. 
	url = window.location.href; 
	url = url.replace( "_1280.", maxres ).replace( "_540.", maxres ).replace( "_500.", maxres ).replace( "_400.", maxres ).replace( "_250.", maxres ).replace( "_100.", maxres );  
	if( window.location.href != url ) { window.location.href = url; }
*/
}

// Decode entity references.
// This is officially the stupidest polyfill I've ever used. How is there an entire class of standard escaped text with NO standard decoder in Javascript?
// Copied straight from https://stackoverflow.com/questions/1912501/unescape-html-entities-in-javascript because decency was not an option. 
function htmlDecode(input) { 
  return new DOMParser().parseFromString(input, "text/html").documentElement.textContent; 
}

// Convert fetch()'d URLs into local URLs. Offsite URLs will 404 instead of triggering CORS as an unhandled error. This language. 
// Hmm. This works, but the test case I thought I had worked anyway (with plain fetches) after several refreshes. 
// Currently not used anywhere - needs further testing. 
function safe_fetch(  ){ 
//	args[0] = args[0]; 		// Editing TBD 
//	var video_post = window.location.protocol + "//" + subdomain[4] + ".tumblr.com/post/" + subdomain[5] + "/"; 
//	let local_root = window.location.protocol + "//" + window.location.host + "/"; 
//	arguments[0] = arguments[0]; 		// Editing TBD - arguments is the argv for JS. 
	arguments[0] = '/' + arguments[0].split( '/' ).slice( 3 ).join( '/' ); 
	return fetch( ... arguments ); 		// Straightforward. If this returns a thing that returns a promise, safe_fetch.then() -should- "just work." 
}





// Given the bare HTML of a Tumblr page, return an array of Promises for image/video/link URLs 
function soft_scrape_page_promise( html_copy ) {
	// Linear portion:
	let http_array = links_from_page( html_copy ); 			// Split bare HTML into link and image sources
	http_array.filter( url => url.indexOf( '/tagged/' ) > 0 ).filter( novelty_filter ); 		// Track tags for statistics, before the blacklist removes them 
	http_array = http_array.filter( tumblr_blacklist_filter ); 		// Blacklist filter for URLs - typical garbage

	function is_an_image( url ) {
		// Whitelist URLs with image file extensions or Tumblr iframe indicators 
		var image_link = false; 
		if( url.indexOf( ".gif" ) > 0 ) { image_link = true; }
		if( url.indexOf( ".jpg" ) > 0 ) { image_link = true; }
		if( url.indexOf( ".jpeg" ) > 0 ) { image_link = true; }
		if( url.indexOf( ".png" ) > 0 ) { image_link = true; }
		if( url.indexOf( "/photoset_iframe" ) > 0 ) { image_link = true; }
		if( url.indexOf( ".tumblr.com/video/" ) > 0 ) { image_link = true; }
		if( url.indexOf( "/audio_player_iframe/" ) > 0 ) { image_link = true; }
		return image_link; 
	}

	// Separate the images
	http_array = http_array.map( url => {  
		if( is_an_image( url ) ) { 		// If it's an image, get rid of any Tumblr variability about resolution or CDNs, to avoid duplicates with nonmatching URLs 
			return image_standardizer( url ); 
		} else { 		// Else if not an image
			if( url.indexOf( window.location.host ) > 0 ) { url += "#local" } else { url += "#offsite" } 		// Mark in-domain vs. out-of-domain URLs. 
			if( options_map.imagesonly ) { return ""; } 			// ?imagesonly to skip links on ?scrapewholesite
			return url + "#link"; 
		}
	} )
	.filter( n => { 		// Remove all empty strings, where "empty" can involve a lot of #gratuitous #tags.  
		if( n.split("#")[0] === "" ) { return false } else { return true } 
	} ); 

	http_array = remove_duplicates( http_array ); 		// Remove duplicates within the list
	http_array = http_array.filter( novelty_filter ); 			// Remove duplicates throughout the page
			// Should this be skipped on scrapewholesite? Might be slowing things down. 

	// Async portion: 
	// Return promise that resolves to list of URLs, including fetched videos and photoset sub-images 
	return Promise.all( http_array.map( s => { 
		if( s.indexOf( '/photoset_iframe' ) > 0 ) { 		// If this URL is a photoset, return a promise for an array of URLs
			return fetch( s, { credentials: 'include' }  ).then( r => r.text() ).then( text => { 		// Fetch URL, get body text from response 
				var photos = text.split( 'href="' ); 		// Isolate photoset elements from href= declarations
				photos.shift(); 		// Get rid of first element because it's everything before the first "href" 
				photos = photos.map( p => p.split( '"' )[0] + "#photoset" ); 		// Tag all photoset images as such, just because 
				photos[0] += "#" + s; 		// Tag first image in set with photoset URL so browse mode can link to it 
				return photos; 
			} ) 
		} 
		else if ( s.indexOf( '.tumblr.com/video/' ) > 0 ) { 		// Else if this URL is an embedded video, return a Tumblr-standard URL for the bare video file 
			var subdomain = s.split( '/' ); 		// E.g. https://www.tumblr.com/video/examplename/123456/500/ -> https,,www.tumblr.com,video,examplename,123456,500
			var video_post = window.location.protocol + "//" + subdomain[4] + ".tumblr.com/post/" + subdomain[5] + "/"; 
				// e.g. http://examplename.tumblr.com/post/123456/ - note window.location.protocol vs. subdomain[0], maintaining http/https locally 

			return fetch( video_post, { credentials: 'include' } ).then( r => r.text() ).then( text => { 
				if( text.indexOf( 'og:image' ) > 0 ) { 			// property="og:image" content="http://67.media.tumblr.com/tumblr_123456_frame1.jpg" --> tumblr_123456_frame1.jpg
					var video_name = text.split( 'og:image' )[1].split( 'media.tumblr.com' )[1].split( '"' )[0].split( '/' ).pop(); 
				} else if( text.indexOf( 'poster=' ) > 0 ) { 		// poster='https://31.media.tumblr.com/tumblr_nuzyxqeJNh1rjoppl_frame1.jpg'
					var video_name = text.split( "poster='" )[1].split( 'media.tumblr.com' )[1].split( "'" )[0].split( '/' ).pop(); 		// Bandaid solution. Tumblr just sucks. 
				} else { 
					return video_post + '#video'; 		// Current methods miss the whole page if these splits miss, so fuck it, just return -something.- 
				} 

				// tumblr_abcdef12345_frame1.jpg -> tumblr_abcdef12345.mp4
				video_name = "tumblr_" + video_name.split( '_' )[1] + ".mp4#video"; 
				video_name = "https://vt.tumblr.com/" + video_name; 		// Standard Tumblr-wide video server 

				return video_name; 		// Should be e.g. https://vt.tumblr.com/tumblr_abcdef12345.mp4 
			} )
		}
		else if ( s.indexOf( "/audio_player_iframe/" ) > 0 ) { 		// Else if this URL is an embedded audio file, return... well not a standard URL perhaps, but an URL.
			// How the fuck do I download audio? Video works-ish. Audio is not well-supported.
			// http://articulatelydecomposed.tumblr.com/post/176171225450/you-know-i-had-to-do-it-the-new-friendsim-had-a
			// Ctrl+F "plays". Points to:
			// http://articulatelydecomposed.tumblr.com/post/176171225450/audio_player_iframe/articulatelydecomposed/tumblr_pcaffm6o6H1rc8keu?audio_file=https%3A%2F%2Fwww.tumblr.com%2Faudio_file%2Farticulatelydecomposed%2F176171225450%2Ftumblr_pcaffm6o6H1rc8keu&color=white&simple=1
			// Still no bare .mp3 link in that iframe. 
			// data-stream-url="https://www.tumblr.com/audio_file/articulatelydecomposed/176171225450/tumblr_pcaffm6o6H1rc8keu" ? 
			// Indeed. Actually that might be in the iframe link. What's that function for decoding URIs? 
			// Wait, the actual file resolves to https://a.tumblr.com/tumblr_pcaffm6o6H1rc8keuo1.mp3 - this is trivial. 
//			let audio_name = s.split( '/' ).pop(); 		// Ignore text before last slash. 
//			audio_name = audio_name.split( '?' ).shift(); 		// Ignore text after a question mark, if there is one. 
			// audio_name should now look something like 'tumblr_12345abdce'. 
//			return 'https://a.tumblr.com/' + audio_name + '1.mp3#FUCK'; 		// Standard tumblr-wide video server. Hopefully. 

			// Inconsistent. 
			// e.g. http://articulatelydecomposed.tumblr.com/post/176307067285/mintchocolatechimp-written-by-reddit-user shows
			// https://a.tumblr.com/tumblr_pce3snfthB1ufch4go1.mp3#offsite#link which works, but also 
			// https://a.tumblr.com/tumblr_pce3snfthB1ufch4go1.mp3&color=white&simple=1#offsite#link which doesn't and
			// https://a.tumblr.com/tumblr_pce3snfthB1ufch4g1.mp3 which also doesn't. 
			// ... both of the 'go1' links are present even without this iframe-based code, because that audio has a 'download' link. 

			// Yeah, so much for the simple answer. Fetch. No URL processing seems necessary - 's' is already on this domain. 
			return fetch( s, { credentials: 'include' } ).then( r => r.text() ).then( text => { 
				// data-stream-url="https://a.tumblr.com/tumblr_pce3snfthB1ufch4go1.mp3" 
				let data_url = text.split( 'data-stream-url="' )[1]; 		// Drop everything before the file declaration.
				data_url = data_url.split( '"' )[0]; 		// Drop everything after the doublequote. Probably not efficient, but fuck regexes. 
				return data_url + "#audio"; 
			} )
			// Alright, well now it sort of works, but sometimes it returns e.g.
			// https://www.tumblr.com/audio_file/articulatelydecomposed/176010754365/tumblr_pc1pq1TsD31rc8keu#FRIG which resolves to
			// https://a.tumblr.com/tumblr_pc1pq1TsD31rc8keuo1.mp3#FRIG
			// Huh. That -is- the correctly-identified "data-stream-url" in some iframes. Shit. 
			// This is set up to handle multiple return values, right? For photosets? So I -could- return the correct URL and some guesses.
				// But at that point - why not just guess from the iframe URL? +1.mp3 and also +o1.mp3. 
				// Can't just make up a username or post number for the /audio_file sort of URL. 
		}
		return Promise.resolve( [s] ); 		// Else if this URL is singular, return a single element... resolved as a promise for Promise.all, in an array for Array.concat. Whee. 
	} ) )
	.then( nested_array => { 		// Given the Promise.all'd array of resolved URLs and URL-arrays 
		return [].concat.apply( [], nested_array ); 		// Concatenate array of arrays - apply turns array into comma-separated list, concat turns CSL of arrays into a single array 
	} ) 
}





// Returns a URL with all the options_map options in ?key=value format - optionally allowing changes to options in the returned URL 
// Valid uses:
// options_url() 	-> 	all current settings, no changes 
// options_url( "name", number ) 	-> 	?name=number
// options_url( "name", true ) 		-> 	?name
// options_url( {name:number} ) 	-> 	?name=number
// options_url( {name:number, other:true} ) 	-> 	?name=number?other
// Note that simply passing "name" will remove ?name, not add it, because the value will evaluate false. I should probably change this? Eh, { key } without :value causes errors. 
function options_url( key, value ) {
	var copy_map = new Object(); 
	for( var i in options_map ) { copy_map[ i ] = options_map[ i ]; } 		
	// In any sensible language, this would read "copy_map = object_map." Javascript genuinely does not know how to copy objects. Fuck's sake. 

	if( typeof key === 'string' ) { 		// the parameters are optional. just calling options_url() will return e.g. example.tumblr.com/archive?ezastumblrscrape?startpage=1
		if( !value ) { value = false; } 		// if there's no value then use false
		copy_map[ key ] = value; 		// change this key, so we can e.g. link to example.tumblr.com/archive?ezastumblrscrape?startpage=2
	}

	else if( typeof key === 'object' ) { 		// If we're passed a hashmap
		for( var i in key ) { 
			if( ! key[ i ] ) { key[ i ] = false; } 		// Turn any false evaluation into an explicit boolean - this might not be necessary
			copy_map[ i ] = key[ i ]; 		// Press key-object values onto copy_map-object values 
		}
	}

	// Construct URL from options
	var base_site = window.location.href.substring( 0, window.location.href.indexOf( "?" ) ); 		// should include /archive, but if not, it still works on most pages
	for( var k in copy_map ) { 		// JS maps are weird. We're actually setting attributes of a generic object. So map[ "thumbnails" ] is the same as map.thumbnails. 
		if( copy_map[ k ] ) { 		// Unless the value is False, print a ?key=value pair.
			base_site += "?" + k; 
			if( copy_map[ k ] !== true ) { base_site += "=" + copy_map[ k ]; }  		// If the value is boolean True, just print the value as a flag. 
		}
	}
	return base_site; 
}