Closed Bug 844407 Opened 11 years ago Closed 11 years ago

Make Tabs threadsafe

Categories

(Firefox for Android Graveyard :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(firefox20 fixed, firefox21 fixed, firefox22 fixed)

RESOLVED FIXED
Firefox 22
Tracking Status
firefox20 --- fixed
firefox21 --- fixed
firefox22 --- fixed

People

(Reporter: bnicholson, Assigned: bnicholson)

References

Details

Attachments

(1 file, 7 obsolete files)

Tabs is not at all threadsafe. closeTab(), addTab(), selectTab(), etc. are called on both the Gecko thread (in handleMessage) and the UI thread. I imagine this may have caused the ConcurrentModificationException from bug 828349
Attached patch Make Tabs threadsafe (obsolete) — Splinter Review
Attachment #717458 - Flags: review?(rnewman)
Comment on attachment 717458 [details] [diff] [review]
Make Tabs threadsafe

Review of attachment 717458 [details] [diff] [review]:
-----------------------------------------------------------------

Overview comment: this is, unfortunately, a good example of why it's hard to add thread-safety after the fact. We're converging on code that's increasingly hard to change safely. This patch has some edge cases, and some future concurrency footguns. A better (or more obviously safe) solution probably exists, at the cost of touching more lines.

This patch is incrementally better than the existing code, and once the volatile issues are fixed I believe (but am not 100% confident) that this code is thread-safe, so r+.

However, the thread safety rests on some assumptions that are neither explicitly called out nor are formally guarded (such as the solitary modification of mSelectedTab), and so I would be enthusiastic about a more thorough refactoring that produced a more obviously correct (and probably more performant) implementation. For example, the use of COWAL for mOrder is probably no longer ideal, because we require Tabs itself to be thread-safe.

I would be happy to work on such if you wish, or review a follow-up with a larger scope.

Note that addressing some of my review comments will start to converge on the whitewash Java concurrency solution of "add synchronized to every method definition", because you're essentially trying to control access to every member from every method. That's usually a code smell.

::: mobile/android/base/Tabs.java
@@ +32,5 @@
>  public class Tabs implements GeckoEventListener {
>      private static final String LOGTAG = "GeckoTabs";
>  
> +    private volatile Tab mSelectedTab;
> +    private final Map<Integer, Tab> mTabs = Collections.synchronizedMap(new HashMap<Integer, Tab>());

You're already using synchronized(mTabs) throughout the code; I would suggest skipping the synchronizedMap part, and just explicitly synchronize everywhere.

Also, it's worth noting in a comment that all of these fields are synchronized on mTabs. And at that point you might as well synchronize on `this`…

@@ +101,5 @@
>          }
>      }
>  
> +    public int getDisplayCount() {
> +        boolean getPrivate = mSelectedTab != null && mSelectedTab.isPrivate();

Marking mSelectedTab as volatile is necessary but not sufficient to make this work! You have two accesses to the field, and you are only saved by the fact that mSelectedTab cannot transition from non-null back to null.

You should either use AtomicReference<Tab>, and .get() the reference once to get a local snapshot, or find a different way to do this -- for example, splitting Tabs into private and non-private tabs, and having a filtered variant of getDisplayCount.

Or just move all of this code into the synchronized block.

@@ +105,5 @@
> +        boolean getPrivate = mSelectedTab != null && mSelectedTab.isPrivate();
> +        int count = 0;
> +        synchronized (mTabs) {
> +            for (Tab tab : mTabs.values()) {
> +                if (tab.isPrivate() == getPrivate) {

Note that by the time this code runs, the following could have occurred:

1. getDisplayCount is called. mSelectedTab is retrieved, and is not private.
2. getPrivate = false.
3. selectTab is called with a private tab. mTabs is used for synchronization.
4. This method blocks until the lock is released.
5. selectTab finishes. mSelectedTab is now set to a private tab.
6. This block gets to run, and the returned count is now the count of non-private tabs, despite the currently selected tab being private.

@@ +128,5 @@
>              BrowserDB.registerBookmarkObserver(getContentResolver(), mContentObserver);
>          }
>      }
>  
>      private Tab addTab(int id, String url, boolean external, int parentId, String title, boolean isPrivate) {

addTab modifies both mTabs and mOrder, but they aren't synchronized together. That might be safe, but I doubt it's intentional.

@@ +155,4 @@
>          }
>      }
>  
>      public Tab selectTab(int id) {

It's worth noting in a comment, both here and at the definition for mSelectedTab, that this method is the *only* place that can mutate mSelectedTab. That fact is the cornerstone for the correctness of this patch; the moment someone accidentally violates it, everything will break.

@@ +169,3 @@
>  
> +            mSelectedTab = tab;
> +            mActivity.runOnUiThread(new Runnable() { 

Nit: trailing whitespace.

@@ +169,5 @@
>  
> +            mSelectedTab = tab;
> +            mActivity.runOnUiThread(new Runnable() { 
> +                public void run() {
> +                    if (isSelectedTab(tab)) {

It's worth noting that this will not occur inside the enclosing synchronized block: it will occur on a different thread. mSelectedTab can mutate between the enclosing block and the execution of isSelectedTab. That's OK in this case (it'll just result in no notification happening for one of the two resultant runnables), but it's conceptually scary.

@@ +170,5 @@
> +            mSelectedTab = tab;
> +            mActivity.runOnUiThread(new Runnable() { 
> +                public void run() {
> +                    if (isSelectedTab(tab)) {
> +                        notifyListeners(tab, TabEvents.SELECTED);

… furthermore, the selected tab can mutate between the isSelectedTab call and this subsequent listener notification!

@@ +228,5 @@
>      public boolean isSelectedTab(Tab tab) {
>          if (mSelectedTab == null)
>              return false;
>  
>          return tab == mSelectedTab;

mSelectedTab can race between 229 and 232.

You should fetch once, not use two member accesses.

@@ +420,5 @@
>  
>      public interface OnTabsChangedListener {
>          public void onTabChanged(Tab tab, TabEvents msg, Object data);
>      }
>      

Nit: whitespace.
Attachment #717458 - Flags: review?(rnewman) → review+
Attached patch Make Tabs threadsafe (obsolete) — Splinter Review
(In reply to Richard Newman [:rnewman] from comment #2)
> 
> @@ +101,5 @@
> >          }
> >      }
> >  
> > +    public int getDisplayCount() {
> > +        boolean getPrivate = mSelectedTab != null && mSelectedTab.isPrivate();
> 
> Marking mSelectedTab as volatile is necessary but not sufficient to make
> this work! You have two accesses to the field, and you are only saved by the
> fact that mSelectedTab cannot transition from non-null back to null.

Yeah, I noticed that when making this patch. This seems like a fair assumption: once a tab exists and is selected, there must always be a selected tab for the remainder of the session. I've added a comment to make this assumption explicit. Also, as you know, the fact that this method and selectTab() are synchronized is a second safety net that prevents this line from breaking - even if that assumption is wrong.

> @@ +170,5 @@
> > +            mSelectedTab = tab;
> > +            mActivity.runOnUiThread(new Runnable() { 
> > +                public void run() {
> > +                    if (isSelectedTab(tab)) {
> > +                        notifyListeners(tab, TabEvents.SELECTED);
> 
> … furthermore, the selected tab can mutate between the isSelectedTab call
> and this subsequent listener notification!

Actually, I don't think we need this Runnable at all. notifyListeners() always goes through the UI thread, so doing that again here seems redundant.

> @@ +228,5 @@
> >      public boolean isSelectedTab(Tab tab) {
> >          if (mSelectedTab == null)
> >              return false;
> >  
> >          return tab == mSelectedTab;
> 
> mSelectedTab can race between 229 and 232.
> 
> You should fetch once, not use two member accesses.

Oops, I think I misread this as "tab == null" instead of "mSelectedTab == null". I changed this to use the former, which should be equivalent (and safe).
Attachment #717458 - Attachment is obsolete: true
Attachment #717572 - Flags: review?(rnewman)
Comment on attachment 717572 [details] [diff] [review]
Make Tabs threadsafe

Review of attachment 717572 [details] [diff] [review]:
-----------------------------------------------------------------

f+, because I made some suggestions that you should probably test first :D

::: mobile/android/base/Tabs.java
@@ +35,1 @@
>      private final CopyOnWriteArrayList<Tab> mOrder = new CopyOnWriteArrayList<Tab>();

// mOrder and mTabs are always of the same cardinality, and contain the same values.

@@ +110,5 @@
> +        // Once mSelectedTab is non-null, it cannot be null for the remainder
> +        // of Fennec's lifetime
> +        boolean getPrivate = mSelectedTab != null && mSelectedTab.isPrivate();
> +        int count = 0;
> +        for (Tab tab : mTabs.values()) {

You could walk mOrder here; it'll probably be faster.

@@ +122,5 @@
>      private void lazyRegisterBookmarkObserver() {
>          if (mContentObserver == null) {
>              mContentObserver = new ContentObserver(null) {
>                  public void onChange(boolean selfChange) {
> +                    synchronized (Tabs.this) {

Way to run with the obscure Java knowledge! Extra points! :D

@@ +123,5 @@
>          if (mContentObserver == null) {
>              mContentObserver = new ContentObserver(null) {
>                  public void onChange(boolean selfChange) {
> +                    synchronized (Tabs.this) {
> +                        for (Tab tab : mTabs.values()) {

Same point about walking mOrder.

@@ +325,5 @@
> +                            tab = mTabs.get(id);
> +                            tab.updateURL(url);
> +                        } else {
> +                            // Tab was already closed; abort
> +                            return;

Nit: while you're here, invert the conditional and early-exit.

  if (!mTabs.containsKey(id)) {
    // Tab must have already been closed.
    return;
  }
  tab = mTabs.get(id);
  tab.updateURL(url);

@@ +401,2 @@
>          final ThumbnailHelper helper = ThumbnailHelper.getInstance();
> +        for (final Tab tab : mTabs.values()) {

Let's try to keep our locks as fine-grained as possible. There's no need to synchronize the ThumbnailHelper access.

Also, for future work: if we were using concurrent data structures here (e.g., ConcurrentHashMap) we wouldn't need to lock during iteration.

In fact, why not iterate over mOrder (which is COW and thus safe to walk) rather than locking?

@@ +403,3 @@
>              GeckoAppShell.getHandler().post(new Runnable() {
>                  public void run() {
>                      helper.getAndProcessThumbnailFor(tab);

And finally: why do we make N runnables on the handler thread, rather than one?

So:

  final ThumbnailHelper helper = …;
  final ArrayList<Tab> tabs;
  synchronized (this) {
    tabs = mOrder;
  }
  GeckoAppShell.getHandler().post(new Runnable() {
    @Override
    public void run() {
      for (final Tab tab : tabs) {
        helper.getAndProcessThumbnailFor(tab);
      }
    }
  });
Attachment #717572 - Flags: review?(rnewman) → feedback+
Attached patch Make Tabs threadsafe, v3 (obsolete) — Splinter Review
I've been backlogged because of the work week last week, but I finally got around to making the proposed changes. I did some minor testing, and this patch works fine.
Attachment #717572 - Attachment is obsolete: true
Attachment #723718 - Flags: review?(rnewman)
Comment on attachment 723718 [details] [diff] [review]
Make Tabs threadsafe, v3

Review of attachment 723718 [details] [diff] [review]:
-----------------------------------------------------------------

This is good, but I spotted a couple of additions. New attachment coming shortly...

::: mobile/android/base/Tabs.java
@@ +134,5 @@
>      }
>  
>      private void lazyRegisterBookmarkObserver() {
>          if (mContentObserver == null) {
>              mContentObserver = new ContentObserver(null) {

Just noticed that this can race -- call loadTab from two threads.

Solved by moving the call to lRBO inside the synchronized block in addTab.

@@ +412,1 @@
>      private static ArrayList<OnTabsChangedListener> mTabsChangedListeners;

This is vulnerable, and complicates code elsewhere. Let's replace with a static initialization.
Attachment #723718 - Flags: review?(rnewman) → feedback+
Attached patch Proposed patch. v4 (obsolete) — Splinter Review
Please take a look at the interdiff, let me know what you think.
Attachment #723829 - Flags: review?(bnicholson)
Attached patch Proposed patch. v5 (obsolete) — Splinter Review
… and now with qref to add missing imports. *sigh*
Attachment #723829 - Attachment is obsolete: true
Attachment #723829 - Flags: review?(bnicholson)
Attachment #723835 - Flags: review?(bnicholson)
Comment on attachment 723835 [details] [diff] [review]
Proposed patch. v5

Review of attachment 723835 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/Tabs.java
@@ +306,5 @@
> +     *         we aren't correctly initialized.
> +     */
> +    private synchronized GeckoApp getActivity() {
> +        if (mActivity == null) {
> +            throw new IllegalStateException("Cannot fetch ContentResolver: activity is null.");

Why ContentResolver? getActivity() doesn't seem to be related to ContentResolvers.

@@ +310,5 @@
> +            throw new IllegalStateException("Cannot fetch ContentResolver: activity is null.");
> +        }
> +        return mActivity;
> +    }
> +        

Nit: trailing whitespace

@@ +317,3 @@
>      }
>  
> +    // Making Tabs a singleton class.

Nit: s/Making/Make/
Attachment #723835 - Flags: review?(bnicholson) → review+
(In reply to Brian Nicholson (:bnicholson) from comment #9)

> Why ContentResolver? getActivity() doesn't seem to be related to
> ContentResolvers.

Forgot to correct the exception string when I lifted this code into its own method.


> > +    // Making Tabs a singleton class.
> 
> Nit: s/Making/Make/

Hey, I just fixed the formatting! ;)
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab05fe92799d
Status: NEW → ASSIGNED
Target Milestone: --- → Firefox 22
I backed this out because apparently rc1/rc2 don't follow the rules.

https://hg.mozilla.org/integration/mozilla-inbound/rev/2e57968e04d3
Or we don't. Let's take a look tomorrow.

I/Robocop ( 3240): 11 INFO TEST-PASS | testNewTab | Number of tabs increased - 3 should equal 3
I/Robocop ( 3240): 12 INFO TEST-END | testNewTab | finished in 15570ms
I/Robocop ( 3240): 13 INFO TEST-START | Shutdown
I/Robocop ( 3240): 14 INFO Passed: 10
I/Robocop ( 3240): 15 INFO Failed: 0
I/Robocop ( 3240): 16 INFO Todo: 0
I/Robocop ( 3240): 17 INFO SimpleTest FINISHED
D/GeckoFavicons( 3240): Requesting cancelation of favicon load (6)
D/GeckoFavicons( 3240): Cancelling favicon load (6)
I/GeckoToolbar( 3240): zerdatime 1000095 - Throbber stop
D/GeckoFavicons( 3240): The provided favicon URL is not valid
D/OpenGLRenderer( 3240): Flushing caches (mode 0)
I/dalvikvm-heap( 3240): Grow heap (frag case) to 9.707MB for 749584-byte allocation
W/ActivityManager( 1402): Duplicate finish request for ActivityRecord{413ad4f0 org.mozilla.fennec/org.mozilla.gecko.AwesomeBar}
W/NetworkManagementSocketTagger( 1402): setKernelCountSet(10044, 1) failed with errno -2
V/TabletStatusBar( 1480): setLightsOn(true)
D/OpenGLRenderer( 3240): Flushing caches (mode 0)
W/InputManagerService( 1402): Starting input on non-focused client com.android.internal.view.IInputMethodClient$Stub$Proxy@4141dda0 (uid=10046 pid=3240)
E/GeckoAppShell( 3240): >>> REPORTING UNCAUGHT EXCEPTION FROM THREAD 139 ("GeckoBackgroundThread")
E/GeckoAppShell( 3240): java.lang.IllegalStateException: Tabs not initialized with a GeckoApp instance.
E/GeckoAppShell( 3240): 	at org.mozilla.gecko.Tabs.getActivity(Tabs.java:310)
E/GeckoAppShell( 3240): 	at org.mozilla.gecko.Tabs.notifyListeners(Tabs.java:481)
E/GeckoAppShell( 3240): 	at org.mozilla.gecko.Tabs.notifyListeners(Tabs.java:476)
E/GeckoAppShell( 3240): 	at org.mozilla.gecko.Tab$1.run(Tab.java:182)
E/GeckoAppShell( 3240): 	at android.os.Handler.handleCallback(Handler.java:605)
E/GeckoAppShell( 3240): 	at android.os.Handler.dispatchMessage(Handler.java:92)
E/GeckoAppShell( 3240): 	at android.os.Looper.loop(Looper.java:137)
E/GeckoAppShell( 3240): 	at org.mozilla.gecko.util.GeckoBackgroundThread.run(GeckoBackgroundThread.java:32)
In that last patch I made detachFromActivity 'complete', and made attach/detach sane. Previously, detachFromActivity *did not remove the reference to the activity*.

Now we do, and it's revealing that there are lifecycle issues with GeckoApp and Tabs.

What seems to be happening here: the Tab thumbnail runnable is running after GeckoApp has been destroyed. It grabs the static Tabs instance, which doesn't have a current GeckoApp (because its GeckoApp already told it to detach), and rightly throws.

This is an old bug that simply wasn't tickled because detachFromActivity didn't discard the reference.

Directly, we have this lifecycle problem because some Tabs stuff is being called from the background thread, but we're detaching directly in GeckoApp.onDestroy.

The trivial -- but wrong -- fix is to not discard the GeckoApp reference in detachFromActivity. That takes us back to where we were.

Another simple fix is to queue a runnable in onDestroy to do Tabs cleanup, which will impose an order of execution on the two conflicting operations.

Another is to directly use GeckoApp.mAppContext.runOnUiThread (that is, eliminate the GeckoApp dependency in Tabs). That starts to air a lot of the singleton dirty laundry here.

Another is to cancel background runnables in onDestroy, assuming we can audit them to make sure they don't need to run even after GeckoApp dies. (This is unlikely to be safe, because Gecko has one background thread, rather than one per GeckoApp.)

The correct course of action is to completely eliminate the singletons in Fennec: GeckoApp *instances* have background and UI threads, have references to a single Tabs instance, etc. This is probably necessary in the medium term, because the current approach is very brittle. However, it will introduce a lot of cross-references between instances.

Note that those cross-references *exist now* -- we just cheat and use global variables.

bnicholson, if you get to this before I do, feel free to pick any one of these solutions, and I'll re-review.

If I get to it first, I will probably do open-heart surgery on Fennec and attempt to rip out all singletons, which will take a long time!
Attached patch Make Tabs threadsafe, v6 (obsolete) — Splinter Review
Removes changes to attachToActivity/detachFromActivity and adds a comment describing the current state and broken lifecycle.
Attachment #723718 - Attachment is obsolete: true
Attachment #723835 - Attachment is obsolete: true
Attachment #724526 - Flags: review?(rnewman)
bnicholson: can we just remove the

  mActivity = null;

line from v5? I think the other changes to attach and detach are worth keeping.
Attached patch Make Tabs thread-safe, v7 (obsolete) — Splinter Review
(In reply to Richard Newman [:rnewman] from comment #17)
> bnicholson: can we just remove the
> 
>   mActivity = null;
> 
> line from v5? I think the other changes to attach and detach are worth
> keeping.

I was concerned that adding calls to detachFromActivity could have side effects similar to setting mActivity to null, but looking more closely, I don't think it should be a problem.
Attachment #724526 - Attachment is obsolete: true
Attachment #724526 - Flags: review?(rnewman)
Attachment #724661 - Flags: review?(rnewman)
And my last patches weren't based on your inbound push, so let's try again...hopefully this is the last iteration!
Attachment #724661 - Attachment is obsolete: true
Attachment #724661 - Flags: review?(rnewman)
Attachment #724665 - Flags: review?(rnewman)
Comment on attachment 724665 [details] [diff] [review]
Make Tabs thread-safe, v8

Review of attachment 724665 [details] [diff] [review]:
-----------------------------------------------------------------

If you can, please make sure that Robocop passes before you push!
Attachment #724665 - Flags: review?(rnewman) → review+
https://hg.mozilla.org/mozilla-central/rev/e38d0a3fe054
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Can we get an uplift nomination with risk assessment here so that we can evaluate this for our next beta?  If low risk enough we might want to take this so we can get the fix in bug 828349 too.
Comment on attachment 724665 [details] [diff] [review]
Make Tabs thread-safe, v8

[Approval Request Comment]
Bug caused by (feature/regressing bug #): none
User impact if declined: possible hard-to-debug crashes and/or unstable behavior
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): low risk
String or UUID changes made by this patch: none

Note that bug 828349 exposes these issues more, so we shouldn't land bug 828349 without this one.
Attachment #724665 - Flags: approval-mozilla-beta?
Attachment #724665 - Flags: approval-mozilla-aurora?
Comment on attachment 724665 [details] [diff] [review]
Make Tabs thread-safe, v8

low risk enough at this point to uplift on aurora as this helps bug 844497
Attachment #724665 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to bhavana bajaj [:bajaj] from comment #26)
> Comment on attachment 724665 [details] [diff] [review]
> Make Tabs thread-safe, v8
> 
> low risk enough at this point to uplift on aurora as this helps bug 844497

err, I meant 828349
Blocks: 851373
Attachment #724665 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: