coc Posted September 2, 2020 Posted September 2, 2020 Just out of curiosity - WSJ speculating that a US company could buy TikTok without its algorithm, and "rebuild it using TikTok's user data". "Withholding the core algorithm wouldn’t necessarily spell the end of TikTok’s U.S. operations, as software engineers could still rebuild it using TikTok’s user data as a current training data set, according to Mr. Wei. But that process might take time and could lead the app to lose users to rivals, he said." As a neophyte, how sane or insane is this plan?
Xaston Posted September 2, 2020 Posted September 2, 2020 Sounds pretty sane to me Source: I have done a fairly significant amount of supervised machine learning myself
Liberty Posted September 2, 2020 Posted September 2, 2020 It would still be valuable, but A LOT less. A LOT. TikTok basically is its algo. It's not based nearly as much on a social graph or an interest graph as other social media companies. The algo does all the heavy lifting.
Xaston Posted September 2, 2020 Posted September 2, 2020 It would still be valuable, but A LOT less. A LOT. TikTok basically is its algo. It's not based nearly as much on a social graph or an interest graph as other social media companies. The algo does all the heavy lifting. I feel like its also the way its packaged (e.g. very short videos, plus some other stuff). Making algorithms that figure out what content you will like based on what other content you have liked, and what other users like you have liked, is very doable by companies not named ByteDance. I guess you could argue it's just that much better than the algorithms determining what to put in front of people on YouTube, Netflix, Spotify, Facebook, etc, but I really don't think that is the case.
Jurgis Posted September 2, 2020 Posted September 2, 2020 It would still be valuable, but A LOT less. A LOT. TikTok basically is its algo. It's not based nearly as much on a social graph or an interest graph as other social media companies. The algo does all the heavy lifting. I feel like its also the way its packaged (e.g. very short videos, plus some other stuff). Making algorithms that figure out what content you will like based on what other content you have liked, and what other users like you have liked, is very doable by companies not named ByteDance. I guess you could argue it's just that much better than the algorithms determining what to put in front of people on YouTube, Netflix, Spotify, Facebook, etc, but I really don't think that is the case. I don't use Tik Tok. Perhaps I should. CCP needs more info about me. I don't use Spotify. Netflix recommendations are mediocre. Youtube recommendations are so so. If you watch 2-3 videos from a single author/person, recommendations get stuck on tons of other videos from the same source. If you watch variety of music videos, it does a bit better IMO suggesting related ones. I mostly gave up on Facebook algorithm. When I go to FB, I just go to the friends&family pages and go through their posts. I only go to my "timeline" or whatever it's called last and don't really spend time on it. IMO it's quite mediocre too.
thowed Posted September 2, 2020 Posted September 2, 2020 I'm afraid I can't remember the source, but I read that the Algo is the WHOLE value of TikTok. Basically think how terrible prediction engines are (e.g. Netflix). Getting it right is the holy grail. TikTok really is supposed to be streets ahead. So it could be used for all sorts of other things. Of course, you are also betting that the engineers can keep ahead of the competition on this, which is more of an unknown. So I believe that this is what the buyers are interested in. Otherwise you end up with a 'hot' app for teenagers that could easily be the next Friendster/MySpace in 10 years time.
winjitsu Posted September 3, 2020 Posted September 3, 2020 I'll disagree and say that the algorithms aren't that important. Personalization (which is the correct word for this field btw) is mostly Baynesian + Machine Learning/Deep Learning, usually some ensemble model with XGBoost etc etc. These are well known in the field -- there's really is no "secret algorithm." What's proprietary is user data and the size of the user base. That should come with the sale.
Xaston Posted September 3, 2020 Posted September 3, 2020 I also want to point out the difference between getting the data from a Netflix user, and trying to figure out what movie or shows to recommend, compared to getting the data from a Tiktok user, and trying to figure out what clip to recommend. If I watch a 15 second video of a cute animal; you can pretty confidently extrapolate that if you put other videos of cute animals in front of me, I'll like them. And it took you 15 seconds of data collection to figure that out. And when you guess wrong, you're costing me very little time, making me feel less critical of your prediction algorithm. If I watch (and thumbs up) Gilmore Girls on Netflix, what does it recommend next? The amount of possible reasons that a sprawling piece of art like a show or movie resonate with a viewer are necessarily much more numerous, complex, and unpredictable, than that of 15 second videos. And it took Netflix 110 hours to collect that data. Then if you want to really actually try to solve the problem, probably the best thing to recommend to someone who the only datapoint you have on them is that they like Gilmore Girls, is to say watch Bunheads or The Marvelous Mrs. Maisel, the other two shows created by the same creator; a problem here is that neither one is available in Netflix's library. And when you guess wrong with a TV show or movie, you're costing me a lot of time, making me feel much more critical of your prediction algorithm.
lnofeisone Posted September 3, 2020 Posted September 3, 2020 I also want to point out the difference between getting the data from a Netflix user, and trying to figure out what movie or shows to recommend, compared to getting the data from a Tiktok user, and trying to figure out what movie or show to recommend. If I watch a 15 second video of a cute animal; you can pretty confidently extrapolate that if you put other videos of cute animals in front of me, I'll like them. And it took you 15 seconds of data collection to figure that out. And when you guess wrong, you're costing me very little time, making me feel less critical of your prediction algorithm. If I watch (and thumbs up) Gilmore Girls on Netflix, what does it recommend next? The amount of possible reasons that a sprawling piece of art like a show or movie resonate with a viewer are necessarily much more numerous, complex, and unpredictable, than that of 15 second videos. And it took Netflix 110 hours to collect that data. Then if you want to really actually try to solve the problem, probably the best thing to recommend to someone who the only datapoint you have on them is that they like Gilmore Girls, is to say watch Bunheads or The Marvelous Mrs. Maisel, the other two shows created by the same creator; a problem here is that neither one is available in Netflix's library. And when you guess wrong with a TV show or movie, you're costing me a lot of time, making me feel much more critical of your prediction algorithm. This is a very astute observation. In terms of data vs. algo, building algo from data is VERY hard. This is especially the case if TikTok hands over the raw data and doesn't disclose how the data was transformed, what metrics are being used to train the models, etc.. Think this, GPT-3 has something like 140 million features (that's columns if you think in terms of rows/column of excel). I also doubt that there is "one algorithm that rules them all" and there is probably 1000s micro-algos. Keep in mind, while ago is being built, data will probably evolve. Long way of saying, feasible but expensive and difficult.
coc Posted September 3, 2020 Author Posted September 3, 2020 Very interesting observations. I asked because it gets to the heart of what makes these things valuable in the first place.
Pelagic Posted September 3, 2020 Posted September 3, 2020 Something to think about with Netflix and other streamers of longer content. Their data point is watching the content. But let's face it, even good shows have bad episodes, we watch them in the hope they're either as good as past episodes or so that we maintain the show's plot and usually only when we're done with a show whether completed or bored of it, would we pass on it. On something like TikTok, Instagram's clone Reels, or Snapchat, you can just swipe passed content you don't like since it's relatively short and not related to a broader plot. Swiping passed content you dislike is itself a data point. Not only is TikTok receiving far more positive data points based on the volume of shorter content, they're also receiving a lot more negative data points to train their algorithm with.
winjitsu Posted September 3, 2020 Posted September 3, 2020 Something to think about with Netflix and other streamers of longer content. Their data point is watching the content. But let's face it, even good shows have bad episodes, we watch them in the hope they're either as good as past episodes or so that we maintain the show's plot and usually only when we're done with a show whether completed or bored of it, would we pass on it. On something like TikTok, Instagram's clone Reels, or Snapchat, you can just swipe passed content you don't like since it's relatively short and not related to a broader plot. Swiping passed content you dislike is itself a data point. Not only is TikTok receiving far more positive data points based on the volume of shorter content, they're also receiving a lot more negative data points to train their algorithm with. Netflix knows when you stop watching a video too. But Netflix isn't a good example to use. In addition to the different content length, Netflix has an issue where their catalog is shrinking as third party content makers pull their libraries to start competing services (https://www.businessinsider.com/netflix-movie-catalog-size-has-gone-down-since-2010-2018-2). Your dissatisfaction with their Recommendations / Personalization may have to do more with how little they have to recommend, versus say Youtube, Instagram, Spotify, which all have millions of content creators.
rkbabang Posted September 4, 2020 Posted September 4, 2020 Something to think about with Netflix and other streamers of longer content. Their data point is watching the content. But let's face it, even good shows have bad episodes, we watch them in the hope they're either as good as past episodes or so that we maintain the show's plot and usually only when we're done with a show whether completed or bored of it, would we pass on it. On something like TikTok, Instagram's clone Reels, or Snapchat, you can just swipe passed content you don't like since it's relatively short and not related to a broader plot. Swiping passed content you dislike is itself a data point. Not only is TikTok receiving far more positive data points based on the volume of shorter content, they're also receiving a lot more negative data points to train their algorithm with. Netflix knows when you stop watching a video too. But Netflix isn't a good example to use. In addition to the different content length, Netflix has an issue where their catalog is shrinking as third party content makers pull their libraries to start competing services (https://www.businessinsider.com/netflix-movie-catalog-size-has-gone-down-since-2010-2018-2). Your dissatisfaction with their Recommendations / Personalization may have to do more with how little they have to recommend, versus say Youtube, Instagram, Spotify, which all have millions of content creators. True, but you need to watch a good portion of a movie before you know if you are going to like it or not. And once you are an hour in, you might as well watch the rest and hope for a good ending. You then watched the entire thing and hated it. How does Netflix know? I haven't thumbed up or down anything on Netflix for years. I just never think to do it. So all it has is what I've watched, but it has no idea if I thought the movies I've watched were great, so-so, or horrible.
lnofeisone Posted September 5, 2020 Posted September 5, 2020 Something to think about with Netflix and other streamers of longer content. Their data point is watching the content. But let's face it, even good shows have bad episodes, we watch them in the hope they're either as good as past episodes or so that we maintain the show's plot and usually only when we're done with a show whether completed or bored of it, would we pass on it. On something like TikTok, Instagram's clone Reels, or Snapchat, you can just swipe passed content you don't like since it's relatively short and not related to a broader plot. Swiping passed content you dislike is itself a data point. Not only is TikTok receiving far more positive data points based on the volume of shorter content, they're also receiving a lot more negative data points to train their algorithm with. Netflix knows when you stop watching a video too. But Netflix isn't a good example to use. In addition to the different content length, Netflix has an issue where their catalog is shrinking as third party content makers pull their libraries to start competing services (https://www.businessinsider.com/netflix-movie-catalog-size-has-gone-down-since-2010-2018-2). Your dissatisfaction with their Recommendations / Personalization may have to do more with how little they have to recommend, versus say Youtube, Instagram, Spotify, which all have millions of content creators. True, but you need to watch a good portion of a movie before you know if you are going to like it or not. And once you are an hour in, you might as well watch the rest and hope for a good ending. You then watched the entire thing and hated it. How does Netflix know? I haven't thumbed up or down anything on Netflix for years. I just never think to do it. So all it has is what I've watched, but it has no idea if I thought the movies I've watched were great, so-so, or horrible. Microsegmentation (user profile building), external data, and and recommendation engines is how Netflix can infer how much of the movie a person watched and if they liked it. It would look like this: There are few questions here. 1) Have you watched the whole move? 2) Did you like it? easy cases: 1) Lnofeisone(ln(e) = 1) is a rater and he has recently watched an episode of Kobra Kai and rated it "like." 2) Lnofeisone is a rater and he has recently watched an episode of Kobra Kai and rated it "dilike." harder case: Lnofeisone watched Cobra Kai recently and didn't rate it. Also, our message "Are you still watching" popped up and was on the screen for 10 hours (and this was validated by IP traffic etc.). Lnofeisone probably didn't watch few episodes of Cobra Kai. It was 2am at this location so Lnofeisone probably fell asleep. Do this for millions of users and decent amount of content and a lot can be inferred, generally correctly. Throw in a recommendation engine (https://www2.seas.gwu.edu/~simhaweb/champalg/cf/papers/wroberts.pdf, this one isn't particularly impressive and requires a lot of eingineering but was novel), external data (rating), etc. and it becomes possible to at least recommend something reasonably close.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now