LCOV - code coverage report
Current view: top level - src/backend/nodes - tidbitmap.c (source / functions) Coverage Total Hit
Test: Code coverage Lines: 87.6 % 654 573
Test Date: 2026-01-26 10:56:24 Functions: 93.9 % 33 31
Legend: Lines:     hit not hit
Branches: + taken - not taken # not executed
Branches: 64.8 % 372 241

             Branch data     Line data    Source code
       1                 :             : /*-------------------------------------------------------------------------
       2                 :             :  *
       3                 :             :  * tidbitmap.c
       4                 :             :  *        PostgreSQL tuple-id (TID) bitmap package
       5                 :             :  *
       6                 :             :  * This module provides bitmap data structures that are spiritually
       7                 :             :  * similar to Bitmapsets, but are specially adapted to store sets of
       8                 :             :  * tuple identifiers (TIDs), or ItemPointers.  In particular, the division
       9                 :             :  * of an ItemPointer into BlockNumber and OffsetNumber is catered for.
      10                 :             :  * Also, since we wish to be able to store very large tuple sets in
      11                 :             :  * memory with this data structure, we support "lossy" storage, in which
      12                 :             :  * we no longer remember individual tuple offsets on a page but only the
      13                 :             :  * fact that a particular page needs to be visited.
      14                 :             :  *
      15                 :             :  * The "lossy" storage uses one bit per disk page, so at the standard 8K
      16                 :             :  * BLCKSZ, we can represent all pages in 64Gb of disk space in about 1Mb
      17                 :             :  * of memory.  People pushing around tables of that size should have a
      18                 :             :  * couple of Mb to spare, so we don't worry about providing a second level
      19                 :             :  * of lossiness.  In theory we could fall back to page ranges at some
      20                 :             :  * point, but for now that seems useless complexity.
      21                 :             :  *
      22                 :             :  * We also support the notion of candidate matches, or rechecking.  This
      23                 :             :  * means we know that a search need visit only some tuples on a page,
      24                 :             :  * but we are not certain that all of those tuples are real matches.
      25                 :             :  * So the eventual heap scan must recheck the quals for these tuples only,
      26                 :             :  * rather than rechecking the quals for all tuples on the page as in the
      27                 :             :  * lossy-bitmap case.  Rechecking can be specified when TIDs are inserted
      28                 :             :  * into a bitmap, and it can also happen internally when we AND a lossy
      29                 :             :  * and a non-lossy page.
      30                 :             :  *
      31                 :             :  *
      32                 :             :  * Copyright (c) 2003-2026, PostgreSQL Global Development Group
      33                 :             :  *
      34                 :             :  * IDENTIFICATION
      35                 :             :  *        src/backend/nodes/tidbitmap.c
      36                 :             :  *
      37                 :             :  *-------------------------------------------------------------------------
      38                 :             :  */
      39                 :             : #include "postgres.h"
      40                 :             : 
      41                 :             : #include <limits.h>
      42                 :             : 
      43                 :             : #include "access/htup_details.h"
      44                 :             : #include "common/hashfn.h"
      45                 :             : #include "common/int.h"
      46                 :             : #include "nodes/bitmapset.h"
      47                 :             : #include "nodes/tidbitmap.h"
      48                 :             : #include "storage/lwlock.h"
      49                 :             : #include "utils/dsa.h"
      50                 :             : 
      51                 :             : /*
      52                 :             :  * When we have to switch over to lossy storage, we use a data structure
      53                 :             :  * with one bit per page, where all pages having the same number DIV
      54                 :             :  * PAGES_PER_CHUNK are aggregated into one chunk.  When a chunk is present
      55                 :             :  * and has the bit set for a given page, there must not be a per-page entry
      56                 :             :  * for that page in the page table.
      57                 :             :  *
      58                 :             :  * We actually store both exact pages and lossy chunks in the same hash
      59                 :             :  * table, using identical data structures.  (This is because the memory
      60                 :             :  * management for hashtables doesn't easily/efficiently allow space to be
      61                 :             :  * transferred easily from one hashtable to another.)  Therefore it's best
      62                 :             :  * if PAGES_PER_CHUNK is the same as TBM_MAX_TUPLES_PER_PAGE, or at least not
      63                 :             :  * too different.  But we also want PAGES_PER_CHUNK to be a power of 2 to
      64                 :             :  * avoid expensive integer remainder operations.  So, define it like this:
      65                 :             :  */
      66                 :             : #define PAGES_PER_CHUNK  (BLCKSZ / 32)
      67                 :             : 
      68                 :             : /* We use BITS_PER_BITMAPWORD and typedef bitmapword from nodes/bitmapset.h */
      69                 :             : 
      70                 :             : #define WORDNUM(x)      ((x) / BITS_PER_BITMAPWORD)
      71                 :             : #define BITNUM(x)       ((x) % BITS_PER_BITMAPWORD)
      72                 :             : 
      73                 :             : /* number of active words for an exact page: */
      74                 :             : #define WORDS_PER_PAGE  ((TBM_MAX_TUPLES_PER_PAGE - 1) / BITS_PER_BITMAPWORD + 1)
      75                 :             : /* number of active words for a lossy chunk: */
      76                 :             : #define WORDS_PER_CHUNK  ((PAGES_PER_CHUNK - 1) / BITS_PER_BITMAPWORD + 1)
      77                 :             : 
      78                 :             : /*
      79                 :             :  * The hashtable entries are represented by this data structure.  For
      80                 :             :  * an exact page, blockno is the page number and bit k of the bitmap
      81                 :             :  * represents tuple offset k+1.  For a lossy chunk, blockno is the first
      82                 :             :  * page in the chunk (this must be a multiple of PAGES_PER_CHUNK) and
      83                 :             :  * bit k represents page blockno+k.  Note that it is not possible to
      84                 :             :  * have exact storage for the first page of a chunk if we are using
      85                 :             :  * lossy storage for any page in the chunk's range, since the same
      86                 :             :  * hashtable entry has to serve both purposes.
      87                 :             :  *
      88                 :             :  * recheck is used only on exact pages --- it indicates that although
      89                 :             :  * only the stated tuples need be checked, the full index qual condition
      90                 :             :  * must be checked for each (ie, these are candidate matches).
      91                 :             :  */
      92                 :             : typedef struct PagetableEntry
      93                 :             : {
      94                 :             :         BlockNumber blockno;            /* page number (hashtable key) */
      95                 :             :         char            status;                 /* hash entry status */
      96                 :             :         bool            ischunk;                /* T = lossy storage, F = exact */
      97                 :             :         bool            recheck;                /* should the tuples be rechecked? */
      98                 :             :         bitmapword      words[Max(WORDS_PER_PAGE, WORDS_PER_CHUNK)];
      99                 :             : } PagetableEntry;
     100                 :             : 
     101                 :             : /*
     102                 :             :  * Holds array of pagetable entries.
     103                 :             :  */
     104                 :             : typedef struct PTEntryArray
     105                 :             : {
     106                 :             :         pg_atomic_uint32 refcount;      /* no. of iterator attached */
     107                 :             :         PagetableEntry ptentry[FLEXIBLE_ARRAY_MEMBER];
     108                 :             : } PTEntryArray;
     109                 :             : 
     110                 :             : /*
     111                 :             :  * We want to avoid the overhead of creating the hashtable, which is
     112                 :             :  * comparatively large, when not necessary. Particularly when we are using a
     113                 :             :  * bitmap scan on the inside of a nestloop join: a bitmap may well live only
     114                 :             :  * long enough to accumulate one entry in such cases.  We therefore avoid
     115                 :             :  * creating an actual hashtable until we need two pagetable entries.  When
     116                 :             :  * just one pagetable entry is needed, we store it in a fixed field of
     117                 :             :  * TIDBitMap.  (NOTE: we don't get rid of the hashtable if the bitmap later
     118                 :             :  * shrinks down to zero or one page again.  So, status can be TBM_HASH even
     119                 :             :  * when nentries is zero or one.)
     120                 :             :  */
     121                 :             : typedef enum
     122                 :             : {
     123                 :             :         TBM_EMPTY,                                      /* no hashtable, nentries == 0 */
     124                 :             :         TBM_ONE_PAGE,                           /* entry1 contains the single entry */
     125                 :             :         TBM_HASH,                                       /* pagetable is valid, entry1 is not */
     126                 :             : } TBMStatus;
     127                 :             : 
     128                 :             : /*
     129                 :             :  * Current iterating state of the TBM.
     130                 :             :  */
     131                 :             : typedef enum
     132                 :             : {
     133                 :             :         TBM_NOT_ITERATING,                      /* not yet converted to page and chunk array */
     134                 :             :         TBM_ITERATING_PRIVATE,          /* converted to local page and chunk array */
     135                 :             :         TBM_ITERATING_SHARED,           /* converted to shared page and chunk array */
     136                 :             : } TBMIteratingState;
     137                 :             : 
     138                 :             : /*
     139                 :             :  * Here is the representation for a whole TIDBitMap:
     140                 :             :  */
     141                 :             : struct TIDBitmap
     142                 :             : {
     143                 :             :         NodeTag         type;                   /* to make it a valid Node */
     144                 :             :         MemoryContext mcxt;                     /* memory context containing me */
     145                 :             :         TBMStatus       status;                 /* see codes above */
     146                 :             :         struct pagetable_hash *pagetable;       /* hash table of PagetableEntry's */
     147                 :             :         int                     nentries;               /* number of entries in pagetable */
     148                 :             :         int                     maxentries;             /* limit on same to meet maxbytes */
     149                 :             :         int                     npages;                 /* number of exact entries in pagetable */
     150                 :             :         int                     nchunks;                /* number of lossy entries in pagetable */
     151                 :             :         TBMIteratingState iterating;    /* tbm_begin_iterate called? */
     152                 :             :         uint32          lossify_start;  /* offset to start lossifying hashtable at */
     153                 :             :         PagetableEntry entry1;          /* used when status == TBM_ONE_PAGE */
     154                 :             :         /* these are valid when iterating is true: */
     155                 :             :         PagetableEntry **spages;        /* sorted exact-page list, or NULL */
     156                 :             :         PagetableEntry **schunks;       /* sorted lossy-chunk list, or NULL */
     157                 :             :         dsa_pointer dsapagetable;       /* dsa_pointer to the element array */
     158                 :             :         dsa_pointer dsapagetableold;    /* dsa_pointer to the old element array */
     159                 :             :         dsa_pointer ptpages;            /* dsa_pointer to the page array */
     160                 :             :         dsa_pointer ptchunks;           /* dsa_pointer to the chunk array */
     161                 :             :         dsa_area   *dsa;                        /* reference to per-query dsa area */
     162                 :             : };
     163                 :             : 
     164                 :             : /*
     165                 :             :  * When iterating over a backend-local bitmap in sorted order, a
     166                 :             :  * TBMPrivateIterator is used to track our progress.  There can be several
     167                 :             :  * iterators scanning the same bitmap concurrently.  Note that the bitmap
     168                 :             :  * becomes read-only as soon as any iterator is created.
     169                 :             :  */
     170                 :             : struct TBMPrivateIterator
     171                 :             : {
     172                 :             :         TIDBitmap  *tbm;                        /* TIDBitmap we're iterating over */
     173                 :             :         int                     spageptr;               /* next spages index */
     174                 :             :         int                     schunkptr;              /* next schunks index */
     175                 :             :         int                     schunkbit;              /* next bit to check in current schunk */
     176                 :             : };
     177                 :             : 
     178                 :             : /*
     179                 :             :  * Holds the shared members of the iterator so that multiple processes
     180                 :             :  * can jointly iterate.
     181                 :             :  */
     182                 :             : typedef struct TBMSharedIteratorState
     183                 :             : {
     184                 :             :         int                     nentries;               /* number of entries in pagetable */
     185                 :             :         int                     maxentries;             /* limit on same to meet maxbytes */
     186                 :             :         int                     npages;                 /* number of exact entries in pagetable */
     187                 :             :         int                     nchunks;                /* number of lossy entries in pagetable */
     188                 :             :         dsa_pointer pagetable;          /* dsa pointers to head of pagetable data */
     189                 :             :         dsa_pointer spages;                     /* dsa pointer to page array */
     190                 :             :         dsa_pointer schunks;            /* dsa pointer to chunk array */
     191                 :             :         LWLock          lock;                   /* lock to protect below members */
     192                 :             :         int                     spageptr;               /* next spages index */
     193                 :             :         int                     schunkptr;              /* next schunks index */
     194                 :             :         int                     schunkbit;              /* next bit to check in current schunk */
     195                 :             : } TBMSharedIteratorState;
     196                 :             : 
     197                 :             : /*
     198                 :             :  * pagetable iteration array.
     199                 :             :  */
     200                 :             : typedef struct PTIterationArray
     201                 :             : {
     202                 :             :         pg_atomic_uint32 refcount;      /* no. of iterator attached */
     203                 :             :         int                     index[FLEXIBLE_ARRAY_MEMBER];   /* index array */
     204                 :             : } PTIterationArray;
     205                 :             : 
     206                 :             : /*
     207                 :             :  * same as TBMPrivateIterator, but it is used for joint iteration, therefore
     208                 :             :  * this also holds a reference to the shared state.
     209                 :             :  */
     210                 :             : struct TBMSharedIterator
     211                 :             : {
     212                 :             :         TBMSharedIteratorState *state;  /* shared state */
     213                 :             :         PTEntryArray *ptbase;           /* pagetable element array */
     214                 :             :         PTIterationArray *ptpages;      /* sorted exact page index list */
     215                 :             :         PTIterationArray *ptchunks; /* sorted lossy page index list */
     216                 :             : };
     217                 :             : 
     218                 :             : /* Local function prototypes */
     219                 :             : static void tbm_union_page(TIDBitmap *a, const PagetableEntry *bpage);
     220                 :             : static bool tbm_intersect_page(TIDBitmap *a, PagetableEntry *apage,
     221                 :             :                                                            const TIDBitmap *b);
     222                 :             : static const PagetableEntry *tbm_find_pageentry(const TIDBitmap *tbm,
     223                 :             :                                                                                                 BlockNumber pageno);
     224                 :             : static PagetableEntry *tbm_get_pageentry(TIDBitmap *tbm, BlockNumber pageno);
     225                 :             : static bool tbm_page_is_lossy(const TIDBitmap *tbm, BlockNumber pageno);
     226                 :             : static void tbm_mark_page_lossy(TIDBitmap *tbm, BlockNumber pageno);
     227                 :             : static void tbm_lossify(TIDBitmap *tbm);
     228                 :             : static int      tbm_comparator(const void *left, const void *right);
     229                 :             : static int      tbm_shared_comparator(const void *left, const void *right,
     230                 :             :                                                                   void *arg);
     231                 :             : 
     232                 :             : /* define hashtable mapping block numbers to PagetableEntry's */
     233                 :             : #define SH_USE_NONDEFAULT_ALLOCATOR
     234                 :             : #define SH_PREFIX pagetable
     235                 :             : #define SH_ELEMENT_TYPE PagetableEntry
     236                 :             : #define SH_KEY_TYPE BlockNumber
     237                 :             : #define SH_KEY blockno
     238                 :             : #define SH_HASH_KEY(tb, key) murmurhash32(key)
     239                 :             : #define SH_EQUAL(tb, a, b) a == b
     240                 :             : #define SH_SCOPE static inline
     241                 :             : #define SH_DEFINE
     242                 :             : #define SH_DECLARE
     243                 :             : #include "lib/simplehash.h"
     244                 :             : 
     245                 :             : 
     246                 :             : /*
     247                 :             :  * tbm_create - create an initially-empty bitmap
     248                 :             :  *
     249                 :             :  * The bitmap will live in the memory context that is CurrentMemoryContext
     250                 :             :  * at the time of this call.  It will be limited to (approximately) maxbytes
     251                 :             :  * total memory consumption.  If the DSA passed to this function is not NULL
     252                 :             :  * then the memory for storing elements of the underlying page table will
     253                 :             :  * be allocated from the DSA.
     254                 :             :  */
     255                 :             : TIDBitmap *
     256                 :        2021 : tbm_create(Size maxbytes, dsa_area *dsa)
     257                 :             : {
     258                 :        2021 :         TIDBitmap  *tbm;
     259                 :             : 
     260                 :             :         /* Create the TIDBitmap struct and zero all its fields */
     261                 :        2021 :         tbm = makeNode(TIDBitmap);
     262                 :             : 
     263                 :        2021 :         tbm->mcxt = CurrentMemoryContext;
     264                 :        2021 :         tbm->status = TBM_EMPTY;
     265                 :             : 
     266                 :        2021 :         tbm->maxentries = tbm_calculate_entries(maxbytes);
     267                 :        2021 :         tbm->lossify_start = 0;
     268                 :        2021 :         tbm->dsa = dsa;
     269                 :        2021 :         tbm->dsapagetable = InvalidDsaPointer;
     270                 :        2021 :         tbm->dsapagetableold = InvalidDsaPointer;
     271                 :        2021 :         tbm->ptpages = InvalidDsaPointer;
     272                 :        2021 :         tbm->ptchunks = InvalidDsaPointer;
     273                 :             : 
     274                 :        4042 :         return tbm;
     275                 :        2021 : }
     276                 :             : 
     277                 :             : /*
     278                 :             :  * Actually create the hashtable.  Since this is a moderately expensive
     279                 :             :  * proposition, we don't do it until we have to.
     280                 :             :  */
     281                 :             : static void
     282                 :         914 : tbm_create_pagetable(TIDBitmap *tbm)
     283                 :             : {
     284         [ +  - ]:         914 :         Assert(tbm->status != TBM_HASH);
     285         [ +  - ]:         914 :         Assert(tbm->pagetable == NULL);
     286                 :             : 
     287                 :         914 :         tbm->pagetable = pagetable_create(tbm->mcxt, 128, tbm);
     288                 :             : 
     289                 :             :         /* If entry1 is valid, push it into the hashtable */
     290         [ +  + ]:         914 :         if (tbm->status == TBM_ONE_PAGE)
     291                 :             :         {
     292                 :         436 :                 PagetableEntry *page;
     293                 :         436 :                 bool            found;
     294                 :         436 :                 char            oldstatus;
     295                 :             : 
     296                 :         872 :                 page = pagetable_insert(tbm->pagetable,
     297                 :         436 :                                                                 tbm->entry1.blockno,
     298                 :             :                                                                 &found);
     299         [ +  - ]:         436 :                 Assert(!found);
     300                 :         436 :                 oldstatus = page->status;
     301                 :         436 :                 memcpy(page, &tbm->entry1, sizeof(PagetableEntry));
     302                 :         436 :                 page->status = oldstatus;
     303                 :         436 :         }
     304                 :             : 
     305                 :         914 :         tbm->status = TBM_HASH;
     306                 :         914 : }
     307                 :             : 
     308                 :             : /*
     309                 :             :  * tbm_free - free a TIDBitmap
     310                 :             :  */
     311                 :             : void
     312                 :        2005 : tbm_free(TIDBitmap *tbm)
     313                 :             : {
     314         [ +  + ]:        2005 :         if (tbm->pagetable)
     315                 :         913 :                 pagetable_destroy(tbm->pagetable);
     316         [ +  + ]:        2005 :         if (tbm->spages)
     317                 :         413 :                 pfree(tbm->spages);
     318         [ +  + ]:        2005 :         if (tbm->schunks)
     319                 :         481 :                 pfree(tbm->schunks);
     320                 :        2005 :         pfree(tbm);
     321                 :        2005 : }
     322                 :             : 
     323                 :             : /*
     324                 :             :  * tbm_free_shared_area - free shared state
     325                 :             :  *
     326                 :             :  * Free shared iterator state, Also free shared pagetable and iterator arrays
     327                 :             :  * memory if they are not referred by any of the shared iterator i.e recount
     328                 :             :  * is becomes 0.
     329                 :             :  */
     330                 :             : void
     331                 :           9 : tbm_free_shared_area(dsa_area *dsa, dsa_pointer dp)
     332                 :             : {
     333                 :           9 :         TBMSharedIteratorState *istate = dsa_get_address(dsa, dp);
     334                 :           9 :         PTEntryArray *ptbase;
     335                 :           9 :         PTIterationArray *ptpages;
     336                 :           9 :         PTIterationArray *ptchunks;
     337                 :             : 
     338         [ -  + ]:           9 :         if (DsaPointerIsValid(istate->pagetable))
     339                 :             :         {
     340                 :           9 :                 ptbase = dsa_get_address(dsa, istate->pagetable);
     341         [ -  + ]:           9 :                 if (pg_atomic_sub_fetch_u32(&ptbase->refcount, 1) == 0)
     342                 :           9 :                         dsa_free(dsa, istate->pagetable);
     343                 :           9 :         }
     344         [ -  + ]:           9 :         if (DsaPointerIsValid(istate->spages))
     345                 :             :         {
     346                 :           9 :                 ptpages = dsa_get_address(dsa, istate->spages);
     347         [ -  + ]:           9 :                 if (pg_atomic_sub_fetch_u32(&ptpages->refcount, 1) == 0)
     348                 :           9 :                         dsa_free(dsa, istate->spages);
     349                 :           9 :         }
     350         [ +  - ]:           9 :         if (DsaPointerIsValid(istate->schunks))
     351                 :             :         {
     352                 :           0 :                 ptchunks = dsa_get_address(dsa, istate->schunks);
     353         [ #  # ]:           0 :                 if (pg_atomic_sub_fetch_u32(&ptchunks->refcount, 1) == 0)
     354                 :           0 :                         dsa_free(dsa, istate->schunks);
     355                 :           0 :         }
     356                 :             : 
     357                 :           9 :         dsa_free(dsa, dp);
     358                 :           9 : }
     359                 :             : 
     360                 :             : /*
     361                 :             :  * tbm_add_tuples - add some tuple IDs to a TIDBitmap
     362                 :             :  *
     363                 :             :  * If recheck is true, then the recheck flag will be set in the
     364                 :             :  * TBMIterateResult when any of these tuples are reported out.
     365                 :             :  */
     366                 :             : void
     367                 :      612667 : tbm_add_tuples(TIDBitmap *tbm, const ItemPointerData *tids, int ntids,
     368                 :             :                            bool recheck)
     369                 :             : {
     370                 :      612667 :         BlockNumber currblk = InvalidBlockNumber;
     371                 :      612667 :         PagetableEntry *page = NULL;    /* only valid when currblk is valid */
     372                 :             : 
     373         [ +  - ]:      612667 :         Assert(tbm->iterating == TBM_NOT_ITERATING);
     374         [ +  + ]:     1564649 :         for (int i = 0; i < ntids; i++)
     375                 :             :         {
     376                 :      951982 :                 BlockNumber blk = ItemPointerGetBlockNumber(tids + i);
     377                 :      951982 :                 OffsetNumber off = ItemPointerGetOffsetNumber(tids + i);
     378                 :      951982 :                 int                     wordnum,
     379                 :             :                                         bitnum;
     380                 :             : 
     381                 :             :                 /* safety check to ensure we don't overrun bit array bounds */
     382         [ +  - ]:      951982 :                 if (off < 1 || off > TBM_MAX_TUPLES_PER_PAGE)
     383   [ #  #  #  # ]:           0 :                         elog(ERROR, "tuple offset out of range: %u", off);
     384                 :             : 
     385                 :             :                 /*
     386                 :             :                  * Look up target page unless we already did.  This saves cycles when
     387                 :             :                  * the input includes consecutive tuples on the same page, which is
     388                 :             :                  * common enough to justify an extra test here.
     389                 :             :                  */
     390         [ +  + ]:      951982 :                 if (blk != currblk)
     391                 :             :                 {
     392         [ +  + ]:      736837 :                         if (tbm_page_is_lossy(tbm, blk))
     393                 :         476 :                                 page = NULL;    /* remember page is lossy */
     394                 :             :                         else
     395                 :      736361 :                                 page = tbm_get_pageentry(tbm, blk);
     396                 :      736837 :                         currblk = blk;
     397                 :      736837 :                 }
     398                 :             : 
     399         [ +  + ]:      951982 :                 if (page == NULL)
     400                 :         476 :                         continue;                       /* whole page is already marked */
     401                 :             : 
     402         [ -  + ]:      951506 :                 if (page->ischunk)
     403                 :             :                 {
     404                 :             :                         /* The page is a lossy chunk header, set bit for itself */
     405                 :           0 :                         wordnum = bitnum = 0;
     406                 :           0 :                 }
     407                 :             :                 else
     408                 :             :                 {
     409                 :             :                         /* Page is exact, so set bit for individual tuple */
     410                 :      951506 :                         wordnum = WORDNUM(off - 1);
     411                 :      951506 :                         bitnum = BITNUM(off - 1);
     412                 :             :                 }
     413                 :      951506 :                 page->words[wordnum] |= ((bitmapword) 1 << bitnum);
     414                 :      951506 :                 page->recheck |= recheck;
     415                 :             : 
     416         [ +  + ]:      951506 :                 if (tbm->nentries > tbm->maxentries)
     417                 :             :                 {
     418                 :           6 :                         tbm_lossify(tbm);
     419                 :             :                         /* Page could have been converted to lossy, so force new lookup */
     420                 :           6 :                         currblk = InvalidBlockNumber;
     421                 :           6 :                 }
     422      [ -  +  + ]:      951982 :         }
     423                 :      612667 : }
     424                 :             : 
     425                 :             : /*
     426                 :             :  * tbm_add_page - add a whole page to a TIDBitmap
     427                 :             :  *
     428                 :             :  * This causes the whole page to be reported (with the recheck flag)
     429                 :             :  * when the TIDBitmap is scanned.
     430                 :             :  */
     431                 :             : void
     432                 :       24660 : tbm_add_page(TIDBitmap *tbm, BlockNumber pageno)
     433                 :             : {
     434                 :             :         /* Enter the page in the bitmap, or mark it lossy if already present */
     435                 :       24660 :         tbm_mark_page_lossy(tbm, pageno);
     436                 :             :         /* If we went over the memory limit, lossify some more pages */
     437         [ +  - ]:       24660 :         if (tbm->nentries > tbm->maxentries)
     438                 :           0 :                 tbm_lossify(tbm);
     439                 :       24660 : }
     440                 :             : 
     441                 :             : /*
     442                 :             :  * tbm_union - set union
     443                 :             :  *
     444                 :             :  * a is modified in-place, b is not changed
     445                 :             :  */
     446                 :             : void
     447                 :           0 : tbm_union(TIDBitmap *a, const TIDBitmap *b)
     448                 :             : {
     449         [ #  # ]:           0 :         Assert(!a->iterating);
     450                 :             :         /* Nothing to do if b is empty */
     451         [ #  # ]:           0 :         if (b->nentries == 0)
     452                 :           0 :                 return;
     453                 :             :         /* Scan through chunks and pages in b, merge into a */
     454         [ #  # ]:           0 :         if (b->status == TBM_ONE_PAGE)
     455                 :           0 :                 tbm_union_page(a, &b->entry1);
     456                 :             :         else
     457                 :             :         {
     458                 :           0 :                 pagetable_iterator i;
     459                 :           0 :                 PagetableEntry *bpage;
     460                 :             : 
     461         [ #  # ]:           0 :                 Assert(b->status == TBM_HASH);
     462                 :           0 :                 pagetable_start_iterate(b->pagetable, &i);
     463         [ #  # ]:           0 :                 while ((bpage = pagetable_iterate(b->pagetable, &i)) != NULL)
     464                 :           0 :                         tbm_union_page(a, bpage);
     465                 :           0 :         }
     466                 :           0 : }
     467                 :             : 
     468                 :             : /* Process one page of b during a union op */
     469                 :             : static void
     470                 :           0 : tbm_union_page(TIDBitmap *a, const PagetableEntry *bpage)
     471                 :             : {
     472                 :           0 :         PagetableEntry *apage;
     473                 :             : 
     474         [ #  # ]:           0 :         if (bpage->ischunk)
     475                 :             :         {
     476                 :             :                 /* Scan b's chunk, mark each indicated page lossy in a */
     477         [ #  # ]:           0 :                 for (int wordnum = 0; wordnum < WORDS_PER_CHUNK; wordnum++)
     478                 :             :                 {
     479                 :           0 :                         bitmapword      w = bpage->words[wordnum];
     480                 :             : 
     481         [ #  # ]:           0 :                         if (w != 0)
     482                 :             :                         {
     483                 :           0 :                                 BlockNumber pg;
     484                 :             : 
     485                 :           0 :                                 pg = bpage->blockno + (wordnum * BITS_PER_BITMAPWORD);
     486         [ #  # ]:           0 :                                 while (w != 0)
     487                 :             :                                 {
     488         [ #  # ]:           0 :                                         if (w & 1)
     489                 :           0 :                                                 tbm_mark_page_lossy(a, pg);
     490                 :           0 :                                         pg++;
     491                 :           0 :                                         w >>= 1;
     492                 :             :                                 }
     493                 :           0 :                         }
     494                 :           0 :                 }
     495                 :           0 :         }
     496         [ #  # ]:           0 :         else if (tbm_page_is_lossy(a, bpage->blockno))
     497                 :             :         {
     498                 :             :                 /* page is already lossy in a, nothing to do */
     499                 :           0 :                 return;
     500                 :             :         }
     501                 :             :         else
     502                 :             :         {
     503                 :           0 :                 apage = tbm_get_pageentry(a, bpage->blockno);
     504         [ #  # ]:           0 :                 if (apage->ischunk)
     505                 :             :                 {
     506                 :             :                         /* The page is a lossy chunk header, set bit for itself */
     507                 :           0 :                         apage->words[0] |= ((bitmapword) 1 << 0);
     508                 :           0 :                 }
     509                 :             :                 else
     510                 :             :                 {
     511                 :             :                         /* Both pages are exact, merge at the bit level */
     512         [ #  # ]:           0 :                         for (int wordnum = 0; wordnum < WORDS_PER_PAGE; wordnum++)
     513                 :           0 :                                 apage->words[wordnum] |= bpage->words[wordnum];
     514                 :           0 :                         apage->recheck |= bpage->recheck;
     515                 :             :                 }
     516                 :             :         }
     517                 :             : 
     518         [ #  # ]:           0 :         if (a->nentries > a->maxentries)
     519                 :           0 :                 tbm_lossify(a);
     520         [ #  # ]:           0 : }
     521                 :             : 
     522                 :             : /*
     523                 :             :  * tbm_intersect - set intersection
     524                 :             :  *
     525                 :             :  * a is modified in-place, b is not changed
     526                 :             :  */
     527                 :             : void
     528                 :           9 : tbm_intersect(TIDBitmap *a, const TIDBitmap *b)
     529                 :             : {
     530         [ +  - ]:           9 :         Assert(!a->iterating);
     531                 :             :         /* Nothing to do if a is empty */
     532         [ +  - ]:           9 :         if (a->nentries == 0)
     533                 :           0 :                 return;
     534                 :             :         /* Scan through chunks and pages in a, try to match to b */
     535         [ -  + ]:           9 :         if (a->status == TBM_ONE_PAGE)
     536                 :             :         {
     537         [ #  # ]:           0 :                 if (tbm_intersect_page(a, &a->entry1, b))
     538                 :             :                 {
     539                 :             :                         /* Page is now empty, remove it from a */
     540         [ #  # ]:           0 :                         Assert(!a->entry1.ischunk);
     541                 :           0 :                         a->npages--;
     542                 :           0 :                         a->nentries--;
     543         [ #  # ]:           0 :                         Assert(a->nentries == 0);
     544                 :           0 :                         a->status = TBM_EMPTY;
     545                 :           0 :                 }
     546                 :           0 :         }
     547                 :             :         else
     548                 :             :         {
     549                 :           9 :                 pagetable_iterator i;
     550                 :           9 :                 PagetableEntry *apage;
     551                 :             : 
     552         [ +  - ]:           9 :                 Assert(a->status == TBM_HASH);
     553                 :           9 :                 pagetable_start_iterate(a->pagetable, &i);
     554         [ +  + ]:        1851 :                 while ((apage = pagetable_iterate(a->pagetable, &i)) != NULL)
     555                 :             :                 {
     556         [ +  + ]:        1842 :                         if (tbm_intersect_page(a, apage, b))
     557                 :             :                         {
     558                 :             :                                 /* Page or chunk is now empty, remove it from a */
     559         [ -  + ]:        1754 :                                 if (apage->ischunk)
     560                 :           0 :                                         a->nchunks--;
     561                 :             :                                 else
     562                 :        1754 :                                         a->npages--;
     563                 :        1754 :                                 a->nentries--;
     564         [ +  - ]:        1754 :                                 if (!pagetable_delete(a->pagetable, apage->blockno))
     565   [ #  #  #  # ]:           0 :                                         elog(ERROR, "hash table corrupted");
     566                 :        1754 :                         }
     567                 :             :                 }
     568                 :           9 :         }
     569                 :           9 : }
     570                 :             : 
     571                 :             : /*
     572                 :             :  * Process one page of a during an intersection op
     573                 :             :  *
     574                 :             :  * Returns true if apage is now empty and should be deleted from a
     575                 :             :  */
     576                 :             : static bool
     577                 :        1842 : tbm_intersect_page(TIDBitmap *a, PagetableEntry *apage, const TIDBitmap *b)
     578                 :             : {
     579                 :        1842 :         const PagetableEntry *bpage;
     580                 :             : 
     581         [ +  + ]:        1842 :         if (apage->ischunk)
     582                 :             :         {
     583                 :             :                 /* Scan each bit in chunk, try to clear */
     584                 :          10 :                 bool            candelete = true;
     585                 :             : 
     586         [ +  + ]:          50 :                 for (int wordnum = 0; wordnum < WORDS_PER_CHUNK; wordnum++)
     587                 :             :                 {
     588                 :          40 :                         bitmapword      w = apage->words[wordnum];
     589                 :             : 
     590         [ +  + ]:          40 :                         if (w != 0)
     591                 :             :                         {
     592                 :          36 :                                 bitmapword      neww = w;
     593                 :          36 :                                 BlockNumber pg;
     594                 :          36 :                                 int                     bitnum;
     595                 :             : 
     596                 :          36 :                                 pg = apage->blockno + (wordnum * BITS_PER_BITMAPWORD);
     597                 :          36 :                                 bitnum = 0;
     598         [ +  + ]:        2202 :                                 while (w != 0)
     599                 :             :                                 {
     600         [ +  + ]:        2166 :                                         if (w & 1)
     601                 :             :                                         {
     602   [ +  +  +  - ]:        1036 :                                                 if (!tbm_page_is_lossy(b, pg) &&
     603                 :          84 :                                                         tbm_find_pageentry(b, pg) == NULL)
     604                 :             :                                                 {
     605                 :             :                                                         /* Page is not in b at all, lose lossy bit */
     606                 :           0 :                                                         neww &= ~((bitmapword) 1 << bitnum);
     607                 :           0 :                                                 }
     608                 :        1036 :                                         }
     609                 :        2166 :                                         pg++;
     610                 :        2166 :                                         bitnum++;
     611                 :        2166 :                                         w >>= 1;
     612                 :             :                                 }
     613                 :          36 :                                 apage->words[wordnum] = neww;
     614         [ +  - ]:          36 :                                 if (neww != 0)
     615                 :          36 :                                         candelete = false;
     616                 :          36 :                         }
     617                 :          40 :                 }
     618                 :          10 :                 return candelete;
     619                 :          10 :         }
     620         [ -  + ]:        1832 :         else if (tbm_page_is_lossy(b, apage->blockno))
     621                 :             :         {
     622                 :             :                 /*
     623                 :             :                  * Some of the tuples in 'a' might not satisfy the quals for 'b', but
     624                 :             :                  * because the page 'b' is lossy, we don't know which ones. Therefore
     625                 :             :                  * we mark 'a' as requiring rechecks, to indicate that at most those
     626                 :             :                  * tuples set in 'a' are matches.
     627                 :             :                  */
     628                 :           0 :                 apage->recheck = true;
     629                 :           0 :                 return false;
     630                 :             :         }
     631                 :             :         else
     632                 :             :         {
     633                 :        1832 :                 bool            candelete = true;
     634                 :             : 
     635                 :        1832 :                 bpage = tbm_find_pageentry(b, apage->blockno);
     636         [ +  + ]:        1832 :                 if (bpage != NULL)
     637                 :             :                 {
     638                 :             :                         /* Both pages are exact, merge at the bit level */
     639         [ +  - ]:        1504 :                         Assert(!bpage->ischunk);
     640         [ +  + ]:        9024 :                         for (int wordnum = 0; wordnum < WORDS_PER_PAGE; wordnum++)
     641                 :             :                         {
     642                 :        7520 :                                 apage->words[wordnum] &= bpage->words[wordnum];
     643         [ +  + ]:        7520 :                                 if (apage->words[wordnum] != 0)
     644                 :          78 :                                         candelete = false;
     645                 :        7520 :                         }
     646                 :        1504 :                         apage->recheck |= bpage->recheck;
     647                 :        1504 :                 }
     648                 :             :                 /* If there is no matching b page, we can just delete the a page */
     649                 :        1832 :                 return candelete;
     650                 :        1832 :         }
     651                 :        1842 : }
     652                 :             : 
     653                 :             : /*
     654                 :             :  * tbm_is_empty - is a TIDBitmap completely empty?
     655                 :             :  */
     656                 :             : bool
     657                 :          64 : tbm_is_empty(const TIDBitmap *tbm)
     658                 :             : {
     659                 :          64 :         return (tbm->nentries == 0);
     660                 :             : }
     661                 :             : 
     662                 :             : /*
     663                 :             :  * tbm_begin_private_iterate - prepare to iterate through a TIDBitmap
     664                 :             :  *
     665                 :             :  * The TBMPrivateIterator struct is created in the caller's memory context.
     666                 :             :  * For a clean shutdown of the iteration, call tbm_end_private_iterate; but
     667                 :             :  * it's okay to just allow the memory context to be released, too.  It is
     668                 :             :  * caller's responsibility not to touch the TBMPrivateIterator anymore once
     669                 :             :  * the TIDBitmap is freed.
     670                 :             :  *
     671                 :             :  * NB: after this is called, it is no longer allowed to modify the contents
     672                 :             :  * of the bitmap.  However, you can call this multiple times to scan the
     673                 :             :  * contents repeatedly, including parallel scans.
     674                 :             :  */
     675                 :             : TBMPrivateIterator *
     676                 :        1982 : tbm_begin_private_iterate(TIDBitmap *tbm)
     677                 :             : {
     678                 :        1982 :         TBMPrivateIterator *iterator;
     679                 :             : 
     680         [ +  - ]:        1982 :         Assert(tbm->iterating != TBM_ITERATING_SHARED);
     681                 :             : 
     682                 :             :         /*
     683                 :             :          * Create the TBMPrivateIterator struct, with enough trailing space to
     684                 :             :          * serve the needs of the TBMIterateResult sub-struct.
     685                 :             :          */
     686                 :        1982 :         iterator = palloc_object(TBMPrivateIterator);
     687                 :        1982 :         iterator->tbm = tbm;
     688                 :             : 
     689                 :             :         /*
     690                 :             :          * Initialize iteration pointers.
     691                 :             :          */
     692                 :        1982 :         iterator->spageptr = 0;
     693                 :        1982 :         iterator->schunkptr = 0;
     694                 :        1982 :         iterator->schunkbit = 0;
     695                 :             : 
     696                 :             :         /*
     697                 :             :          * If we have a hashtable, create and fill the sorted page lists, unless
     698                 :             :          * we already did that for a previous iterator.  Note that the lists are
     699                 :             :          * attached to the bitmap not the iterator, so they can be used by more
     700                 :             :          * than one iterator.
     701                 :             :          */
     702   [ +  +  -  + ]:        1982 :         if (tbm->status == TBM_HASH && tbm->iterating == TBM_NOT_ITERATING)
     703                 :             :         {
     704                 :         893 :                 pagetable_iterator i;
     705                 :         893 :                 PagetableEntry *page;
     706                 :         893 :                 int                     npages;
     707                 :         893 :                 int                     nchunks;
     708                 :             : 
     709   [ +  -  +  + ]:         893 :                 if (!tbm->spages && tbm->npages > 0)
     710                 :         414 :                         tbm->spages = (PagetableEntry **)
     711                 :         828 :                                 MemoryContextAlloc(tbm->mcxt,
     712                 :         414 :                                                                    tbm->npages * sizeof(PagetableEntry *));
     713   [ +  -  +  + ]:         893 :                 if (!tbm->schunks && tbm->nchunks > 0)
     714                 :         481 :                         tbm->schunks = (PagetableEntry **)
     715                 :         962 :                                 MemoryContextAlloc(tbm->mcxt,
     716                 :         481 :                                                                    tbm->nchunks * sizeof(PagetableEntry *));
     717                 :             : 
     718                 :         893 :                 npages = nchunks = 0;
     719                 :         893 :                 pagetable_start_iterate(tbm->pagetable, &i);
     720         [ +  + ]:       12800 :                 while ((page = pagetable_iterate(tbm->pagetable, &i)) != NULL)
     721                 :             :                 {
     722         [ +  + ]:       11907 :                         if (page->ischunk)
     723                 :         492 :                                 tbm->schunks[nchunks++] = page;
     724                 :             :                         else
     725                 :       11415 :                                 tbm->spages[npages++] = page;
     726                 :             :                 }
     727         [ +  - ]:         893 :                 Assert(npages == tbm->npages);
     728         [ +  - ]:         893 :                 Assert(nchunks == tbm->nchunks);
     729         [ +  + ]:         893 :                 if (npages > 1)
     730                 :         414 :                         qsort(tbm->spages, npages, sizeof(PagetableEntry *),
     731                 :             :                                   tbm_comparator);
     732         [ +  + ]:         893 :                 if (nchunks > 1)
     733                 :           3 :                         qsort(tbm->schunks, nchunks, sizeof(PagetableEntry *),
     734                 :             :                                   tbm_comparator);
     735                 :         893 :         }
     736                 :             : 
     737                 :        1982 :         tbm->iterating = TBM_ITERATING_PRIVATE;
     738                 :             : 
     739                 :        3964 :         return iterator;
     740                 :        1982 : }
     741                 :             : 
     742                 :             : /*
     743                 :             :  * tbm_prepare_shared_iterate - prepare shared iteration state for a TIDBitmap.
     744                 :             :  *
     745                 :             :  * The necessary shared state will be allocated from the DSA passed to
     746                 :             :  * tbm_create, so that multiple processes can attach to it and iterate jointly.
     747                 :             :  *
     748                 :             :  * This will convert the pagetable hash into page and chunk array of the index
     749                 :             :  * into pagetable array.
     750                 :             :  */
     751                 :             : dsa_pointer
     752                 :          12 : tbm_prepare_shared_iterate(TIDBitmap *tbm)
     753                 :             : {
     754                 :          12 :         dsa_pointer dp;
     755                 :          12 :         TBMSharedIteratorState *istate;
     756                 :          12 :         PTEntryArray *ptbase = NULL;
     757                 :          12 :         PTIterationArray *ptpages = NULL;
     758                 :          12 :         PTIterationArray *ptchunks = NULL;
     759                 :             : 
     760         [ +  - ]:          12 :         Assert(tbm->dsa != NULL);
     761         [ +  - ]:          12 :         Assert(tbm->iterating != TBM_ITERATING_PRIVATE);
     762                 :             : 
     763                 :             :         /*
     764                 :             :          * Allocate TBMSharedIteratorState from DSA to hold the shared members and
     765                 :             :          * lock, this will also be used by multiple worker for shared iterate.
     766                 :             :          */
     767                 :          12 :         dp = dsa_allocate0(tbm->dsa, sizeof(TBMSharedIteratorState));
     768                 :          12 :         istate = dsa_get_address(tbm->dsa, dp);
     769                 :             : 
     770                 :             :         /*
     771                 :             :          * If we're not already iterating, create and fill the sorted page lists.
     772                 :             :          * (If we are, the sorted page lists are already stored in the TIDBitmap,
     773                 :             :          * and we can just reuse them.)
     774                 :             :          */
     775         [ -  + ]:          12 :         if (tbm->iterating == TBM_NOT_ITERATING)
     776                 :             :         {
     777                 :          12 :                 pagetable_iterator i;
     778                 :          12 :                 PagetableEntry *page;
     779                 :          12 :                 int                     idx;
     780                 :          12 :                 int                     npages;
     781                 :          12 :                 int                     nchunks;
     782                 :             : 
     783                 :             :                 /*
     784                 :             :                  * Allocate the page and chunk array memory from the DSA to share
     785                 :             :                  * across multiple processes.
     786                 :             :                  */
     787         [ -  + ]:          12 :                 if (tbm->npages)
     788                 :             :                 {
     789                 :          12 :                         tbm->ptpages = dsa_allocate(tbm->dsa, sizeof(PTIterationArray) +
     790                 :             :                                                                                 tbm->npages * sizeof(int));
     791                 :          12 :                         ptpages = dsa_get_address(tbm->dsa, tbm->ptpages);
     792                 :          12 :                         pg_atomic_init_u32(&ptpages->refcount, 0);
     793                 :          12 :                 }
     794         [ +  + ]:          12 :                 if (tbm->nchunks)
     795                 :             :                 {
     796                 :           1 :                         tbm->ptchunks = dsa_allocate(tbm->dsa, sizeof(PTIterationArray) +
     797                 :             :                                                                                  tbm->nchunks * sizeof(int));
     798                 :           1 :                         ptchunks = dsa_get_address(tbm->dsa, tbm->ptchunks);
     799                 :           1 :                         pg_atomic_init_u32(&ptchunks->refcount, 0);
     800                 :           1 :                 }
     801                 :             : 
     802                 :             :                 /*
     803                 :             :                  * If TBM status is TBM_HASH then iterate over the pagetable and
     804                 :             :                  * convert it to page and chunk arrays.  But if it's in the
     805                 :             :                  * TBM_ONE_PAGE mode then directly allocate the space for one entry
     806                 :             :                  * from the DSA.
     807                 :             :                  */
     808                 :          12 :                 npages = nchunks = 0;
     809         [ -  + ]:          12 :                 if (tbm->status == TBM_HASH)
     810                 :             :                 {
     811                 :          12 :                         ptbase = dsa_get_address(tbm->dsa, tbm->dsapagetable);
     812                 :             : 
     813                 :          12 :                         pagetable_start_iterate(tbm->pagetable, &i);
     814         [ +  + ]:        4517 :                         while ((page = pagetable_iterate(tbm->pagetable, &i)) != NULL)
     815                 :             :                         {
     816                 :        4505 :                                 idx = page - ptbase->ptentry;
     817         [ +  + ]:        4505 :                                 if (page->ischunk)
     818                 :           4 :                                         ptchunks->index[nchunks++] = idx;
     819                 :             :                                 else
     820                 :        4501 :                                         ptpages->index[npages++] = idx;
     821                 :             :                         }
     822                 :             : 
     823         [ +  - ]:          12 :                         Assert(npages == tbm->npages);
     824         [ +  - ]:          12 :                         Assert(nchunks == tbm->nchunks);
     825                 :          12 :                 }
     826         [ #  # ]:           0 :                 else if (tbm->status == TBM_ONE_PAGE)
     827                 :             :                 {
     828                 :             :                         /*
     829                 :             :                          * In one page mode allocate the space for one pagetable entry,
     830                 :             :                          * initialize it, and directly store its index (i.e. 0) in the
     831                 :             :                          * page array.
     832                 :             :                          */
     833                 :           0 :                         tbm->dsapagetable = dsa_allocate(tbm->dsa, sizeof(PTEntryArray) +
     834                 :             :                                                                                          sizeof(PagetableEntry));
     835                 :           0 :                         ptbase = dsa_get_address(tbm->dsa, tbm->dsapagetable);
     836                 :           0 :                         memcpy(ptbase->ptentry, &tbm->entry1, sizeof(PagetableEntry));
     837                 :           0 :                         ptpages->index[0] = 0;
     838                 :           0 :                 }
     839                 :             : 
     840         [ -  + ]:          12 :                 if (ptbase != NULL)
     841                 :          12 :                         pg_atomic_init_u32(&ptbase->refcount, 0);
     842         [ -  + ]:          12 :                 if (npages > 1)
     843                 :          24 :                         qsort_arg(ptpages->index, npages, sizeof(int),
     844                 :          12 :                                           tbm_shared_comparator, ptbase->ptentry);
     845         [ +  + ]:          12 :                 if (nchunks > 1)
     846                 :           2 :                         qsort_arg(ptchunks->index, nchunks, sizeof(int),
     847                 :           1 :                                           tbm_shared_comparator, ptbase->ptentry);
     848                 :          12 :         }
     849                 :             : 
     850                 :             :         /*
     851                 :             :          * Store the TBM members in the shared state so that we can share them
     852                 :             :          * across multiple processes.
     853                 :             :          */
     854                 :          12 :         istate->nentries = tbm->nentries;
     855                 :          12 :         istate->maxentries = tbm->maxentries;
     856                 :          12 :         istate->npages = tbm->npages;
     857                 :          12 :         istate->nchunks = tbm->nchunks;
     858                 :          12 :         istate->pagetable = tbm->dsapagetable;
     859                 :          12 :         istate->spages = tbm->ptpages;
     860                 :          12 :         istate->schunks = tbm->ptchunks;
     861                 :             : 
     862                 :          12 :         ptbase = dsa_get_address(tbm->dsa, tbm->dsapagetable);
     863                 :          12 :         ptpages = dsa_get_address(tbm->dsa, tbm->ptpages);
     864                 :          12 :         ptchunks = dsa_get_address(tbm->dsa, tbm->ptchunks);
     865                 :             : 
     866                 :             :         /*
     867                 :             :          * For every shared iterator referring to pagetable and iterator array,
     868                 :             :          * increase the refcount by 1 so that while freeing the shared iterator we
     869                 :             :          * don't free pagetable and iterator array until its refcount becomes 0.
     870                 :             :          */
     871         [ -  + ]:          12 :         if (ptbase != NULL)
     872                 :          12 :                 pg_atomic_add_fetch_u32(&ptbase->refcount, 1);
     873         [ -  + ]:          12 :         if (ptpages != NULL)
     874                 :          12 :                 pg_atomic_add_fetch_u32(&ptpages->refcount, 1);
     875         [ +  + ]:          12 :         if (ptchunks != NULL)
     876                 :           1 :                 pg_atomic_add_fetch_u32(&ptchunks->refcount, 1);
     877                 :             : 
     878                 :             :         /* Initialize the iterator lock */
     879                 :          12 :         LWLockInitialize(&istate->lock, LWTRANCHE_SHARED_TIDBITMAP);
     880                 :             : 
     881                 :             :         /* Initialize the shared iterator state */
     882                 :          12 :         istate->schunkbit = 0;
     883                 :          12 :         istate->schunkptr = 0;
     884                 :          12 :         istate->spageptr = 0;
     885                 :             : 
     886                 :          12 :         tbm->iterating = TBM_ITERATING_SHARED;
     887                 :             : 
     888                 :          24 :         return dp;
     889                 :          12 : }
     890                 :             : 
     891                 :             : /*
     892                 :             :  * tbm_extract_page_tuple - extract the tuple offsets from a page
     893                 :             :  *
     894                 :             :  * Returns the number of offsets it filled in if <= max_offsets. Otherwise,
     895                 :             :  * fills in as many offsets as fit and returns the total number of offsets in
     896                 :             :  * the page.
     897                 :             :  */
     898                 :             : int
     899                 :       16467 : tbm_extract_page_tuple(TBMIterateResult *iteritem,
     900                 :             :                                            OffsetNumber *offsets,
     901                 :             :                                            uint32 max_offsets)
     902                 :             : {
     903                 :       16467 :         PagetableEntry *page = iteritem->internal_page;
     904                 :       16467 :         int                     ntuples = 0;
     905                 :             : 
     906         [ +  + ]:       98802 :         for (int wordnum = 0; wordnum < WORDS_PER_PAGE; wordnum++)
     907                 :             :         {
     908                 :       82335 :                 bitmapword      w = page->words[wordnum];
     909                 :             : 
     910         [ +  + ]:       82335 :                 if (w != 0)
     911                 :             :                 {
     912                 :       23075 :                         int                     off = wordnum * BITS_PER_BITMAPWORD + 1;
     913                 :             : 
     914         [ +  + ]:      826195 :                         while (w != 0)
     915                 :             :                         {
     916         [ +  + ]:      803120 :                                 if (w & 1)
     917                 :             :                                 {
     918         [ -  + ]:      535357 :                                         if (ntuples < max_offsets)
     919                 :      535357 :                                                 offsets[ntuples] = (OffsetNumber) off;
     920                 :      535357 :                                         ntuples++;
     921                 :      535357 :                                 }
     922                 :      803120 :                                 off++;
     923                 :      803120 :                                 w >>= 1;
     924                 :             :                         }
     925                 :       23075 :                 }
     926                 :       82335 :         }
     927                 :             : 
     928                 :       32934 :         return ntuples;
     929                 :       16467 : }
     930                 :             : 
     931                 :             : /*
     932                 :             :  *      tbm_advance_schunkbit - Advance the schunkbit
     933                 :             :  */
     934                 :             : static inline void
     935                 :       28252 : tbm_advance_schunkbit(PagetableEntry *chunk, int *schunkbitp)
     936                 :             : {
     937                 :       28252 :         int                     schunkbit = *schunkbitp;
     938                 :             : 
     939         [ +  + ]:      128498 :         while (schunkbit < PAGES_PER_CHUNK)
     940                 :             :         {
     941                 :      128002 :                 int                     wordnum = WORDNUM(schunkbit);
     942                 :      128002 :                 int                     bitnum = BITNUM(schunkbit);
     943                 :             : 
     944         [ +  + ]:      128002 :                 if ((chunk->words[wordnum] & ((bitmapword) 1 << bitnum)) != 0)
     945                 :       27756 :                         break;
     946                 :      100246 :                 schunkbit++;
     947      [ -  +  + ]:      128002 :         }
     948                 :             : 
     949                 :       28252 :         *schunkbitp = schunkbit;
     950                 :       28252 : }
     951                 :             : 
     952                 :             : /*
     953                 :             :  * tbm_private_iterate - scan through next page of a TIDBitmap
     954                 :             :  *
     955                 :             :  * Caller must pass in a TBMIterateResult to be filled.
     956                 :             :  *
     957                 :             :  * Pages are guaranteed to be delivered in numerical order.
     958                 :             :  *
     959                 :             :  * Returns false when there are no more pages to scan and true otherwise. When
     960                 :             :  * there are no more pages to scan, tbmres->blockno is set to
     961                 :             :  * InvalidBlockNumber.
     962                 :             :  *
     963                 :             :  * If lossy is true, then the bitmap is "lossy" and failed to remember
     964                 :             :  * the exact tuples to look at on this page --- the caller must examine all
     965                 :             :  * tuples on the page and check if they meet the intended condition. If lossy
     966                 :             :  * is false, the caller must later extract the tuple offsets from the page
     967                 :             :  * pointed to by internal_page with tbm_extract_page_tuple.
     968                 :             :  *
     969                 :             :  * If tbmres->recheck is true, only the indicated tuples need be examined, but
     970                 :             :  * the condition must be rechecked anyway.  (For ease of testing, recheck is
     971                 :             :  * always set true when lossy is true.)
     972                 :             :  */
     973                 :             : bool
     974                 :       40161 : tbm_private_iterate(TBMPrivateIterator *iterator, TBMIterateResult *tbmres)
     975                 :             : {
     976                 :       40161 :         TIDBitmap  *tbm = iterator->tbm;
     977                 :             : 
     978         [ +  - ]:       40161 :         Assert(tbm->iterating == TBM_ITERATING_PRIVATE);
     979                 :             : 
     980                 :             :         /*
     981                 :             :          * If lossy chunk pages remain, make sure we've advanced schunkptr/
     982                 :             :          * schunkbit to the next set bit.
     983                 :             :          */
     984         [ +  + ]:       40653 :         while (iterator->schunkptr < tbm->nchunks)
     985                 :             :         {
     986                 :       27227 :                 PagetableEntry *chunk = tbm->schunks[iterator->schunkptr];
     987                 :       27227 :                 int                     schunkbit = iterator->schunkbit;
     988                 :             : 
     989                 :       27227 :                 tbm_advance_schunkbit(chunk, &schunkbit);
     990         [ +  + ]:       27227 :                 if (schunkbit < PAGES_PER_CHUNK)
     991                 :             :                 {
     992                 :       26735 :                         iterator->schunkbit = schunkbit;
     993                 :       26735 :                         break;
     994                 :             :                 }
     995                 :             :                 /* advance to next chunk */
     996                 :         492 :                 iterator->schunkptr++;
     997                 :         492 :                 iterator->schunkbit = 0;
     998      [ -  +  + ]:       27227 :         }
     999                 :             : 
    1000                 :             :         /*
    1001                 :             :          * If both chunk and per-page data remain, must output the numerically
    1002                 :             :          * earlier page.
    1003                 :             :          */
    1004         [ +  + ]:       40161 :         if (iterator->schunkptr < tbm->nchunks)
    1005                 :             :         {
    1006                 :       26735 :                 PagetableEntry *chunk = tbm->schunks[iterator->schunkptr];
    1007                 :       26735 :                 BlockNumber chunk_blockno;
    1008                 :             : 
    1009                 :       26735 :                 chunk_blockno = chunk->blockno + iterator->schunkbit;
    1010   [ +  +  +  + ]:       26735 :                 if (iterator->spageptr >= tbm->npages ||
    1011                 :        2075 :                         chunk_blockno < tbm->spages[iterator->spageptr]->blockno)
    1012                 :             :                 {
    1013                 :             :                         /* Return a lossy page indicator from the chunk */
    1014                 :       26213 :                         tbmres->blockno = chunk_blockno;
    1015                 :       26213 :                         tbmres->lossy = true;
    1016                 :       26213 :                         tbmres->recheck = true;
    1017                 :       26213 :                         tbmres->internal_page = NULL;
    1018                 :       26213 :                         iterator->schunkbit++;
    1019                 :       26213 :                         return true;
    1020                 :             :                 }
    1021         [ +  + ]:       26735 :         }
    1022                 :             : 
    1023         [ +  + ]:       13948 :         if (iterator->spageptr < tbm->npages)
    1024                 :             :         {
    1025                 :       11968 :                 PagetableEntry *page;
    1026                 :             : 
    1027                 :             :                 /* In TBM_ONE_PAGE state, we don't allocate an spages[] array */
    1028         [ +  + ]:       11968 :                 if (tbm->status == TBM_ONE_PAGE)
    1029                 :         553 :                         page = &tbm->entry1;
    1030                 :             :                 else
    1031                 :       11415 :                         page = tbm->spages[iterator->spageptr];
    1032                 :             : 
    1033                 :       11968 :                 tbmres->internal_page = page;
    1034                 :       11968 :                 tbmres->blockno = page->blockno;
    1035                 :       11968 :                 tbmres->lossy = false;
    1036                 :       11968 :                 tbmres->recheck = page->recheck;
    1037                 :       11968 :                 iterator->spageptr++;
    1038                 :       11968 :                 return true;
    1039                 :       11968 :         }
    1040                 :             : 
    1041                 :             :         /* Nothing more in the bitmap */
    1042                 :        1980 :         tbmres->blockno = InvalidBlockNumber;
    1043                 :        1980 :         return false;
    1044                 :       40161 : }
    1045                 :             : 
    1046                 :             : /*
    1047                 :             :  *      tbm_shared_iterate - scan through next page of a TIDBitmap
    1048                 :             :  *
    1049                 :             :  *      As above, but this will iterate using an iterator which is shared
    1050                 :             :  *      across multiple processes.  We need to acquire the iterator LWLock,
    1051                 :             :  *      before accessing the shared members.
    1052                 :             :  */
    1053                 :             : bool
    1054                 :        5075 : tbm_shared_iterate(TBMSharedIterator *iterator, TBMIterateResult *tbmres)
    1055                 :             : {
    1056                 :        5075 :         TBMSharedIteratorState *istate = iterator->state;
    1057                 :        5075 :         PagetableEntry *ptbase = NULL;
    1058                 :        5075 :         int                *idxpages = NULL;
    1059                 :        5075 :         int                *idxchunks = NULL;
    1060                 :             : 
    1061         [ -  + ]:        5075 :         if (iterator->ptbase != NULL)
    1062                 :        5075 :                 ptbase = iterator->ptbase->ptentry;
    1063         [ -  + ]:        5075 :         if (iterator->ptpages != NULL)
    1064                 :        5075 :                 idxpages = iterator->ptpages->index;
    1065         [ +  + ]:        5075 :         if (iterator->ptchunks != NULL)
    1066                 :        1240 :                 idxchunks = iterator->ptchunks->index;
    1067                 :             : 
    1068                 :             :         /* Acquire the LWLock before accessing the shared members */
    1069                 :        5075 :         LWLockAcquire(&istate->lock, LW_EXCLUSIVE);
    1070                 :             : 
    1071                 :             :         /*
    1072                 :             :          * If lossy chunk pages remain, make sure we've advanced schunkptr/
    1073                 :             :          * schunkbit to the next set bit.
    1074                 :             :          */
    1075         [ +  + ]:        5079 :         while (istate->schunkptr < istate->nchunks)
    1076                 :             :         {
    1077                 :        1025 :                 PagetableEntry *chunk = &ptbase[idxchunks[istate->schunkptr]];
    1078                 :        1025 :                 int                     schunkbit = istate->schunkbit;
    1079                 :             : 
    1080                 :        1025 :                 tbm_advance_schunkbit(chunk, &schunkbit);
    1081         [ +  + ]:        1025 :                 if (schunkbit < PAGES_PER_CHUNK)
    1082                 :             :                 {
    1083                 :        1021 :                         istate->schunkbit = schunkbit;
    1084                 :        1021 :                         break;
    1085                 :             :                 }
    1086                 :             :                 /* advance to next chunk */
    1087                 :           4 :                 istate->schunkptr++;
    1088                 :           4 :                 istate->schunkbit = 0;
    1089      [ -  +  + ]:        1025 :         }
    1090                 :             : 
    1091                 :             :         /*
    1092                 :             :          * If both chunk and per-page data remain, must output the numerically
    1093                 :             :          * earlier page.
    1094                 :             :          */
    1095         [ +  + ]:        5075 :         if (istate->schunkptr < istate->nchunks)
    1096                 :             :         {
    1097                 :        1021 :                 PagetableEntry *chunk = &ptbase[idxchunks[istate->schunkptr]];
    1098                 :        1021 :                 BlockNumber chunk_blockno;
    1099                 :             : 
    1100                 :        1021 :                 chunk_blockno = chunk->blockno + istate->schunkbit;
    1101                 :             : 
    1102   [ +  -  +  + ]:        1021 :                 if (istate->spageptr >= istate->npages ||
    1103                 :        1021 :                         chunk_blockno < ptbase[idxpages[istate->spageptr]].blockno)
    1104                 :             :                 {
    1105                 :             :                         /* Return a lossy page indicator from the chunk */
    1106                 :         517 :                         tbmres->blockno = chunk_blockno;
    1107                 :         517 :                         tbmres->lossy = true;
    1108                 :         517 :                         tbmres->recheck = true;
    1109                 :         517 :                         tbmres->internal_page = NULL;
    1110                 :         517 :                         istate->schunkbit++;
    1111                 :             : 
    1112                 :         517 :                         LWLockRelease(&istate->lock);
    1113                 :         517 :                         return true;
    1114                 :             :                 }
    1115         [ +  + ]:        1021 :         }
    1116                 :             : 
    1117         [ +  + ]:        4558 :         if (istate->spageptr < istate->npages)
    1118                 :             :         {
    1119                 :        4501 :                 PagetableEntry *page = &ptbase[idxpages[istate->spageptr]];
    1120                 :             : 
    1121                 :        4501 :                 tbmres->internal_page = page;
    1122                 :        4501 :                 tbmres->blockno = page->blockno;
    1123                 :        4501 :                 tbmres->lossy = false;
    1124                 :        4501 :                 tbmres->recheck = page->recheck;
    1125                 :        4501 :                 istate->spageptr++;
    1126                 :             : 
    1127                 :        4501 :                 LWLockRelease(&istate->lock);
    1128                 :             : 
    1129                 :        4501 :                 return true;
    1130                 :        4501 :         }
    1131                 :             : 
    1132                 :          57 :         LWLockRelease(&istate->lock);
    1133                 :             : 
    1134                 :             :         /* Nothing more in the bitmap */
    1135                 :          57 :         tbmres->blockno = InvalidBlockNumber;
    1136                 :          57 :         return false;
    1137                 :        5075 : }
    1138                 :             : 
    1139                 :             : /*
    1140                 :             :  * tbm_end_private_iterate - finish an iteration over a TIDBitmap
    1141                 :             :  *
    1142                 :             :  * Currently this is just a pfree, but it might do more someday.  (For
    1143                 :             :  * instance, it could be useful to count open iterators and allow the
    1144                 :             :  * bitmap to return to read/write status when there are no more iterators.)
    1145                 :             :  */
    1146                 :             : void
    1147                 :        1966 : tbm_end_private_iterate(TBMPrivateIterator *iterator)
    1148                 :             : {
    1149                 :        1966 :         pfree(iterator);
    1150                 :        1966 : }
    1151                 :             : 
    1152                 :             : /*
    1153                 :             :  * tbm_end_shared_iterate - finish a shared iteration over a TIDBitmap
    1154                 :             :  *
    1155                 :             :  * This doesn't free any of the shared state associated with the iterator,
    1156                 :             :  * just our backend-private state.
    1157                 :             :  */
    1158                 :             : void
    1159                 :          57 : tbm_end_shared_iterate(TBMSharedIterator *iterator)
    1160                 :             : {
    1161                 :          57 :         pfree(iterator);
    1162                 :          57 : }
    1163                 :             : 
    1164                 :             : /*
    1165                 :             :  * tbm_find_pageentry - find a PagetableEntry for the pageno
    1166                 :             :  *
    1167                 :             :  * Returns NULL if there is no non-lossy entry for the pageno.
    1168                 :             :  */
    1169                 :             : static const PagetableEntry *
    1170                 :        1916 : tbm_find_pageentry(const TIDBitmap *tbm, BlockNumber pageno)
    1171                 :             : {
    1172                 :        1916 :         const PagetableEntry *page;
    1173                 :             : 
    1174         [ +  - ]:        1916 :         if (tbm->nentries == 0)              /* in case pagetable doesn't exist */
    1175                 :           0 :                 return NULL;
    1176                 :             : 
    1177         [ -  + ]:        1916 :         if (tbm->status == TBM_ONE_PAGE)
    1178                 :             :         {
    1179                 :           0 :                 page = &tbm->entry1;
    1180         [ #  # ]:           0 :                 if (page->blockno != pageno)
    1181                 :           0 :                         return NULL;
    1182         [ #  # ]:           0 :                 Assert(!page->ischunk);
    1183                 :           0 :                 return page;
    1184                 :             :         }
    1185                 :             : 
    1186                 :        1916 :         page = pagetable_lookup(tbm->pagetable, pageno);
    1187         [ +  + ]:        1916 :         if (page == NULL)
    1188                 :         328 :                 return NULL;
    1189         [ -  + ]:        1588 :         if (page->ischunk)
    1190                 :           0 :                 return NULL;                    /* don't want a lossy chunk header */
    1191                 :        1588 :         return page;
    1192                 :        1916 : }
    1193                 :             : 
    1194                 :             : /*
    1195                 :             :  * tbm_get_pageentry - find or create a PagetableEntry for the pageno
    1196                 :             :  *
    1197                 :             :  * If new, the entry is marked as an exact (non-chunk) entry.
    1198                 :             :  *
    1199                 :             :  * This may cause the table to exceed the desired memory size.  It is
    1200                 :             :  * up to the caller to call tbm_lossify() at the next safe point if so.
    1201                 :             :  */
    1202                 :             : static PagetableEntry *
    1203                 :      736361 : tbm_get_pageentry(TIDBitmap *tbm, BlockNumber pageno)
    1204                 :             : {
    1205                 :      736361 :         PagetableEntry *page;
    1206                 :      736361 :         bool            found;
    1207                 :             : 
    1208         [ +  + ]:      736361 :         if (tbm->status == TBM_EMPTY)
    1209                 :             :         {
    1210                 :             :                 /* Use the fixed slot */
    1211                 :         989 :                 page = &tbm->entry1;
    1212                 :         989 :                 found = false;
    1213                 :         989 :                 tbm->status = TBM_ONE_PAGE;
    1214                 :         989 :         }
    1215                 :             :         else
    1216                 :             :         {
    1217         [ +  + ]:      735372 :                 if (tbm->status == TBM_ONE_PAGE)
    1218                 :             :                 {
    1219                 :        5373 :                         page = &tbm->entry1;
    1220         [ +  + ]:        5373 :                         if (page->blockno == pageno)
    1221                 :        4937 :                                 return page;
    1222                 :             :                         /* Time to switch from one page to a hashtable */
    1223                 :         436 :                         tbm_create_pagetable(tbm);
    1224                 :         436 :                 }
    1225                 :             : 
    1226                 :             :                 /* Look up or create an entry */
    1227                 :      730435 :                 page = pagetable_insert(tbm->pagetable, pageno, &found);
    1228                 :             :         }
    1229                 :             : 
    1230                 :             :         /* Initialize it if not present before */
    1231         [ +  + ]:      731424 :         if (!found)
    1232                 :             :         {
    1233                 :       23280 :                 char            oldstatus = page->status;
    1234                 :             : 
    1235   [ +  -  +  -  :      162960 :                 MemSet(page, 0, sizeof(PagetableEntry));
          +  -  -  +  +  
                      + ]
    1236                 :       23280 :                 page->status = oldstatus;
    1237                 :       23280 :                 page->blockno = pageno;
    1238                 :             :                 /* must count it too */
    1239                 :       23280 :                 tbm->nentries++;
    1240                 :       23280 :                 tbm->npages++;
    1241                 :       23280 :         }
    1242                 :             : 
    1243                 :      731424 :         return page;
    1244                 :      736361 : }
    1245                 :             : 
    1246                 :             : /*
    1247                 :             :  * tbm_page_is_lossy - is the page marked as lossily stored?
    1248                 :             :  */
    1249                 :             : static bool
    1250                 :      739705 : tbm_page_is_lossy(const TIDBitmap *tbm, BlockNumber pageno)
    1251                 :             : {
    1252                 :      739705 :         PagetableEntry *page;
    1253                 :      739705 :         BlockNumber chunk_pageno;
    1254                 :      739705 :         int                     bitno;
    1255                 :             : 
    1256                 :             :         /* we can skip the lookup if there are no lossy chunks */
    1257         [ +  + ]:      739705 :         if (tbm->nchunks == 0)
    1258                 :      717994 :                 return false;
    1259         [ +  - ]:       21711 :         Assert(tbm->status == TBM_HASH);
    1260                 :             : 
    1261                 :       21711 :         bitno = pageno % PAGES_PER_CHUNK;
    1262                 :       21711 :         chunk_pageno = pageno - bitno;
    1263                 :             : 
    1264                 :       21711 :         page = pagetable_lookup(tbm->pagetable, chunk_pageno);
    1265                 :             : 
    1266   [ +  -  +  + ]:       21711 :         if (page != NULL && page->ischunk)
    1267                 :             :         {
    1268                 :        3189 :                 int                     wordnum = WORDNUM(bitno);
    1269                 :        3189 :                 int                     bitnum = BITNUM(bitno);
    1270                 :             : 
    1271         [ +  + ]:        3189 :                 if ((page->words[wordnum] & ((bitmapword) 1 << bitnum)) != 0)
    1272                 :        1428 :                         return true;
    1273         [ +  + ]:        3189 :         }
    1274                 :       20283 :         return false;
    1275                 :      739705 : }
    1276                 :             : 
    1277                 :             : /*
    1278                 :             :  * tbm_mark_page_lossy - mark the page number as lossily stored
    1279                 :             :  *
    1280                 :             :  * This may cause the table to exceed the desired memory size.  It is
    1281                 :             :  * up to the caller to call tbm_lossify() at the next safe point if so.
    1282                 :             :  */
    1283                 :             : static void
    1284                 :       27738 : tbm_mark_page_lossy(TIDBitmap *tbm, BlockNumber pageno)
    1285                 :             : {
    1286                 :       27738 :         PagetableEntry *page;
    1287                 :       27738 :         bool            found;
    1288                 :       27738 :         BlockNumber chunk_pageno;
    1289                 :       27738 :         int                     bitno;
    1290                 :       27738 :         int                     wordnum;
    1291                 :       27738 :         int                     bitnum;
    1292                 :             : 
    1293                 :             :         /* We force the bitmap into hashtable mode whenever it's lossy */
    1294         [ +  + ]:       27738 :         if (tbm->status != TBM_HASH)
    1295                 :         478 :                 tbm_create_pagetable(tbm);
    1296                 :             : 
    1297                 :       27738 :         bitno = pageno % PAGES_PER_CHUNK;
    1298                 :       27738 :         chunk_pageno = pageno - bitno;
    1299                 :             : 
    1300                 :             :         /*
    1301                 :             :          * Remove any extant non-lossy entry for the page.  If the page is its own
    1302                 :             :          * chunk header, however, we skip this and handle the case below.
    1303                 :             :          */
    1304         [ +  + ]:       27738 :         if (bitno != 0)
    1305                 :             :         {
    1306         [ +  + ]:       27350 :                 if (pagetable_delete(tbm->pagetable, pageno))
    1307                 :             :                 {
    1308                 :             :                         /* It was present, so adjust counts */
    1309                 :        3078 :                         tbm->nentries--;
    1310                 :        3078 :                         tbm->npages--;               /* assume it must have been non-lossy */
    1311                 :        3078 :                 }
    1312                 :       27350 :         }
    1313                 :             : 
    1314                 :             :         /* Look up or create entry for chunk-header page */
    1315                 :       27738 :         page = pagetable_insert(tbm->pagetable, chunk_pageno, &found);
    1316                 :             : 
    1317                 :             :         /* Initialize it if not present before */
    1318         [ +  + ]:       27738 :         if (!found)
    1319                 :             :         {
    1320                 :         478 :                 char            oldstatus = page->status;
    1321                 :             : 
    1322   [ +  -  +  -  :        3346 :                 MemSet(page, 0, sizeof(PagetableEntry));
          +  -  -  +  +  
                      + ]
    1323                 :         478 :                 page->status = oldstatus;
    1324                 :         478 :                 page->blockno = chunk_pageno;
    1325                 :         478 :                 page->ischunk = true;
    1326                 :             :                 /* must count it too */
    1327                 :         478 :                 tbm->nentries++;
    1328                 :         478 :                 tbm->nchunks++;
    1329                 :         478 :         }
    1330         [ +  + ]:       27260 :         else if (!page->ischunk)
    1331                 :             :         {
    1332                 :          26 :                 char            oldstatus = page->status;
    1333                 :             : 
    1334                 :             :                 /* chunk header page was formerly non-lossy, make it lossy */
    1335   [ +  -  +  -  :         182 :                 MemSet(page, 0, sizeof(PagetableEntry));
          +  -  -  +  +  
                      + ]
    1336                 :          26 :                 page->status = oldstatus;
    1337                 :          26 :                 page->blockno = chunk_pageno;
    1338                 :          26 :                 page->ischunk = true;
    1339                 :             :                 /* we assume it had some tuple bit(s) set, so mark it lossy */
    1340                 :          26 :                 page->words[0] = ((bitmapword) 1 << 0);
    1341                 :             :                 /* adjust counts */
    1342                 :          26 :                 tbm->nchunks++;
    1343                 :          26 :                 tbm->npages--;
    1344                 :          26 :         }
    1345                 :             : 
    1346                 :             :         /* Now set the original target page's bit */
    1347                 :       27738 :         wordnum = WORDNUM(bitno);
    1348                 :       27738 :         bitnum = BITNUM(bitno);
    1349                 :       27738 :         page->words[wordnum] |= ((bitmapword) 1 << bitnum);
    1350                 :       27738 : }
    1351                 :             : 
    1352                 :             : /*
    1353                 :             :  * tbm_lossify - lose some information to get back under the memory limit
    1354                 :             :  */
    1355                 :             : static void
    1356                 :           6 : tbm_lossify(TIDBitmap *tbm)
    1357                 :             : {
    1358                 :           6 :         pagetable_iterator i;
    1359                 :           6 :         PagetableEntry *page;
    1360                 :             : 
    1361                 :             :         /*
    1362                 :             :          * XXX Really stupid implementation: this just lossifies pages in
    1363                 :             :          * essentially random order.  We should be paying some attention to the
    1364                 :             :          * number of bits set in each page, instead.
    1365                 :             :          *
    1366                 :             :          * Since we are called as soon as nentries exceeds maxentries, we should
    1367                 :             :          * push nentries down to significantly less than maxentries, or else we'll
    1368                 :             :          * just end up doing this again very soon.  We shoot for maxentries/2.
    1369                 :             :          */
    1370         [ +  - ]:           6 :         Assert(tbm->iterating == TBM_NOT_ITERATING);
    1371         [ +  - ]:           6 :         Assert(tbm->status == TBM_HASH);
    1372                 :             : 
    1373                 :           6 :         pagetable_start_iterate_at(tbm->pagetable, &i, tbm->lossify_start);
    1374         [ -  + ]:        3082 :         while ((page = pagetable_iterate(tbm->pagetable, &i)) != NULL)
    1375                 :             :         {
    1376         [ -  + ]:        3082 :                 if (page->ischunk)
    1377                 :           0 :                         continue;                       /* already a chunk header */
    1378                 :             : 
    1379                 :             :                 /*
    1380                 :             :                  * If the page would become a chunk header, we won't save anything by
    1381                 :             :                  * converting it to lossy, so skip it.
    1382                 :             :                  */
    1383         [ +  + ]:        3082 :                 if ((page->blockno % PAGES_PER_CHUNK) == 0)
    1384                 :           4 :                         continue;
    1385                 :             : 
    1386                 :             :                 /* This does the dirty work ... */
    1387                 :        3078 :                 tbm_mark_page_lossy(tbm, page->blockno);
    1388                 :             : 
    1389         [ +  + ]:        3078 :                 if (tbm->nentries <= tbm->maxentries / 2)
    1390                 :             :                 {
    1391                 :             :                         /*
    1392                 :             :                          * We have made enough room. Remember where to start lossifying
    1393                 :             :                          * next round, so we evenly iterate over the hashtable.
    1394                 :             :                          */
    1395                 :           6 :                         tbm->lossify_start = i.cur;
    1396                 :           6 :                         break;
    1397                 :             :                 }
    1398                 :             : 
    1399                 :             :                 /*
    1400                 :             :                  * Note: tbm_mark_page_lossy may have inserted a lossy chunk into the
    1401                 :             :                  * hashtable and may have deleted the non-lossy chunk.  We can
    1402                 :             :                  * continue the same hash table scan, since failure to visit one
    1403                 :             :                  * element or visiting the newly inserted element, isn't fatal.
    1404                 :             :                  */
    1405                 :             :         }
    1406                 :             : 
    1407                 :             :         /*
    1408                 :             :          * With a big bitmap and small work_mem, it's possible that we cannot get
    1409                 :             :          * under maxentries.  Again, if that happens, we'd end up uselessly
    1410                 :             :          * calling tbm_lossify over and over.  To prevent this from becoming a
    1411                 :             :          * performance sink, force maxentries up to at least double the current
    1412                 :             :          * number of entries.  (In essence, we're admitting inability to fit
    1413                 :             :          * within work_mem when we do this.)  Note that this test will not fire if
    1414                 :             :          * we broke out of the loop early; and if we didn't, the current number of
    1415                 :             :          * entries is simply not reducible any further.
    1416                 :             :          */
    1417         [ +  - ]:           6 :         if (tbm->nentries > tbm->maxentries / 2)
    1418         [ #  # ]:           0 :                 tbm->maxentries = Min(tbm->nentries, (INT_MAX - 1) / 2) * 2;
    1419                 :           6 : }
    1420                 :             : 
    1421                 :             : /*
    1422                 :             :  * qsort comparator to handle PagetableEntry pointers.
    1423                 :             :  */
    1424                 :             : static int
    1425                 :       66032 : tbm_comparator(const void *left, const void *right)
    1426                 :             : {
    1427                 :       66032 :         BlockNumber l = (*((PagetableEntry *const *) left))->blockno;
    1428                 :       66032 :         BlockNumber r = (*((PagetableEntry *const *) right))->blockno;
    1429                 :             : 
    1430                 :      132064 :         return pg_cmp_u32(l, r);
    1431                 :       66032 : }
    1432                 :             : 
    1433                 :             : /*
    1434                 :             :  * As above, but this will get index into PagetableEntry array.  Therefore,
    1435                 :             :  * it needs to get actual PagetableEntry using the index before comparing the
    1436                 :             :  * blockno.
    1437                 :             :  */
    1438                 :             : static int
    1439                 :       39715 : tbm_shared_comparator(const void *left, const void *right, void *arg)
    1440                 :             : {
    1441                 :       39715 :         PagetableEntry *base = (PagetableEntry *) arg;
    1442                 :       39715 :         PagetableEntry *lpage = &base[*(int *) left];
    1443                 :       39715 :         PagetableEntry *rpage = &base[*(int *) right];
    1444                 :             : 
    1445         [ +  + ]:       39715 :         if (lpage->blockno < rpage->blockno)
    1446                 :       18161 :                 return -1;
    1447         [ +  - ]:       21554 :         else if (lpage->blockno > rpage->blockno)
    1448                 :       21554 :                 return 1;
    1449                 :           0 :         return 0;
    1450                 :       39715 : }
    1451                 :             : 
    1452                 :             : /*
    1453                 :             :  *      tbm_attach_shared_iterate
    1454                 :             :  *
    1455                 :             :  *      Allocate a backend-private iterator and attach the shared iterator state
    1456                 :             :  *      to it so that multiple processed can iterate jointly.
    1457                 :             :  *
    1458                 :             :  *      We also converts the DSA pointers to local pointers and store them into
    1459                 :             :  *      our private iterator.
    1460                 :             :  */
    1461                 :             : TBMSharedIterator *
    1462                 :          57 : tbm_attach_shared_iterate(dsa_area *dsa, dsa_pointer dp)
    1463                 :             : {
    1464                 :          57 :         TBMSharedIterator *iterator;
    1465                 :          57 :         TBMSharedIteratorState *istate;
    1466                 :             : 
    1467                 :             :         /*
    1468                 :             :          * Create the TBMSharedIterator struct, with enough trailing space to
    1469                 :             :          * serve the needs of the TBMIterateResult sub-struct.
    1470                 :             :          */
    1471                 :          57 :         iterator = palloc0_object(TBMSharedIterator);
    1472                 :             : 
    1473                 :          57 :         istate = (TBMSharedIteratorState *) dsa_get_address(dsa, dp);
    1474                 :             : 
    1475                 :          57 :         iterator->state = istate;
    1476                 :             : 
    1477                 :          57 :         iterator->ptbase = dsa_get_address(dsa, istate->pagetable);
    1478                 :             : 
    1479         [ -  + ]:          57 :         if (istate->npages)
    1480                 :          57 :                 iterator->ptpages = dsa_get_address(dsa, istate->spages);
    1481         [ +  + ]:          57 :         if (istate->nchunks)
    1482                 :           5 :                 iterator->ptchunks = dsa_get_address(dsa, istate->schunks);
    1483                 :             : 
    1484                 :         114 :         return iterator;
    1485                 :          57 : }
    1486                 :             : 
    1487                 :             : /*
    1488                 :             :  * pagetable_allocate
    1489                 :             :  *
    1490                 :             :  * Callback function for allocating the memory for hashtable elements.
    1491                 :             :  * Allocate memory for hashtable elements, using DSA if available.
    1492                 :             :  */
    1493                 :             : static inline void *
    1494                 :         952 : pagetable_allocate(pagetable_hash *pagetable, Size size)
    1495                 :             : {
    1496                 :         952 :         TIDBitmap  *tbm = (TIDBitmap *) pagetable->private_data;
    1497                 :         952 :         PTEntryArray *ptbase;
    1498                 :             : 
    1499         [ +  + ]:         952 :         if (tbm->dsa == NULL)
    1500                 :         926 :                 return MemoryContextAllocExtended(pagetable->ctx, size,
    1501                 :             :                                                                                   MCXT_ALLOC_HUGE | MCXT_ALLOC_ZERO);
    1502                 :             : 
    1503                 :             :         /*
    1504                 :             :          * Save the dsapagetable reference in dsapagetableold before allocating
    1505                 :             :          * new memory so that pagetable_free can free the old entry.
    1506                 :             :          */
    1507                 :          26 :         tbm->dsapagetableold = tbm->dsapagetable;
    1508                 :          52 :         tbm->dsapagetable = dsa_allocate_extended(tbm->dsa,
    1509                 :          26 :                                                                                           sizeof(PTEntryArray) + size,
    1510                 :             :                                                                                           DSA_ALLOC_HUGE | DSA_ALLOC_ZERO);
    1511                 :          26 :         ptbase = dsa_get_address(tbm->dsa, tbm->dsapagetable);
    1512                 :             : 
    1513                 :          26 :         return ptbase->ptentry;
    1514                 :         952 : }
    1515                 :             : 
    1516                 :             : /*
    1517                 :             :  * pagetable_free
    1518                 :             :  *
    1519                 :             :  * Callback function for freeing hash table elements.
    1520                 :             :  */
    1521                 :             : static inline void
    1522                 :         951 : pagetable_free(pagetable_hash *pagetable, void *pointer)
    1523                 :             : {
    1524                 :         951 :         TIDBitmap  *tbm = (TIDBitmap *) pagetable->private_data;
    1525                 :             : 
    1526                 :             :         /* pfree the input pointer if DSA is not available */
    1527         [ +  + ]:         951 :         if (tbm->dsa == NULL)
    1528                 :         925 :                 pfree(pointer);
    1529         [ +  + ]:          26 :         else if (DsaPointerIsValid(tbm->dsapagetableold))
    1530                 :             :         {
    1531                 :          14 :                 dsa_free(tbm->dsa, tbm->dsapagetableold);
    1532                 :          14 :                 tbm->dsapagetableold = InvalidDsaPointer;
    1533                 :          14 :         }
    1534                 :         951 : }
    1535                 :             : 
    1536                 :             : /*
    1537                 :             :  * tbm_calculate_entries
    1538                 :             :  *
    1539                 :             :  * Estimate number of hashtable entries we can have within maxbytes.
    1540                 :             :  */
    1541                 :             : int
    1542                 :       65490 : tbm_calculate_entries(Size maxbytes)
    1543                 :             : {
    1544                 :       65490 :         Size            nbuckets;
    1545                 :             : 
    1546                 :             :         /*
    1547                 :             :          * Estimate number of hashtable entries we can have within maxbytes. This
    1548                 :             :          * estimates the hash cost as sizeof(PagetableEntry), which is good enough
    1549                 :             :          * for our purpose.  Also count an extra Pointer per entry for the arrays
    1550                 :             :          * created during iteration readout.
    1551                 :             :          */
    1552                 :       65490 :         nbuckets = maxbytes /
    1553                 :             :                 (sizeof(PagetableEntry) + sizeof(Pointer) + sizeof(Pointer));
    1554         [ +  - ]:       65490 :         nbuckets = Min(nbuckets, INT_MAX - 1);  /* safety limit */
    1555         [ +  - ]:       65490 :         nbuckets = Max(nbuckets, 16);   /* sanity limit */
    1556                 :             : 
    1557                 :      130980 :         return (int) nbuckets;
    1558                 :       65490 : }
    1559                 :             : 
    1560                 :             : /*
    1561                 :             :  * Create a shared or private bitmap iterator and start iteration.
    1562                 :             :  *
    1563                 :             :  * `tbm` is only used to create the private iterator and dsa and dsp are only
    1564                 :             :  * used to create the shared iterator.
    1565                 :             :  *
    1566                 :             :  * Before invoking tbm_begin_iterate() to create a shared iterator, one
    1567                 :             :  * process must already have invoked tbm_prepare_shared_iterate() to create
    1568                 :             :  * and set up the TBMSharedIteratorState.
    1569                 :             :  */
    1570                 :             : TBMIterator
    1571                 :        2011 : tbm_begin_iterate(TIDBitmap *tbm, dsa_area *dsa, dsa_pointer dsp)
    1572                 :             : {
    1573                 :        2011 :         TBMIterator iterator = {0};
    1574                 :             : 
    1575                 :             :         /* Allocate a private iterator and attach the shared state to it */
    1576         [ +  + ]:        2011 :         if (DsaPointerIsValid(dsp))
    1577                 :             :         {
    1578                 :          57 :                 iterator.shared = true;
    1579                 :          57 :                 iterator.i.shared_iterator = tbm_attach_shared_iterate(dsa, dsp);
    1580                 :          57 :         }
    1581                 :             :         else
    1582                 :             :         {
    1583                 :        1954 :                 iterator.shared = false;
    1584                 :        1954 :                 iterator.i.private_iterator = tbm_begin_private_iterate(tbm);
    1585                 :             :         }
    1586                 :             : 
    1587                 :        2011 :         return iterator;
    1588                 :             : }
    1589                 :             : 
    1590                 :             : /*
    1591                 :             :  * Clean up shared or private bitmap iterator.
    1592                 :             :  */
    1593                 :             : void
    1594                 :        1995 : tbm_end_iterate(TBMIterator *iterator)
    1595                 :             : {
    1596         [ +  - ]:        1995 :         Assert(iterator && !tbm_exhausted(iterator));
    1597                 :             : 
    1598         [ +  + ]:        1995 :         if (iterator->shared)
    1599                 :          57 :                 tbm_end_shared_iterate(iterator->i.shared_iterator);
    1600                 :             :         else
    1601                 :        1938 :                 tbm_end_private_iterate(iterator->i.private_iterator);
    1602                 :             : 
    1603                 :        3990 :         *iterator = (TBMIterator)
    1604                 :        1995 :         {
    1605                 :             :                 0
    1606                 :             :         };
    1607                 :        1995 : }
    1608                 :             : 
    1609                 :             : /*
    1610                 :             :  * Populate the next TBMIterateResult using the shared or private bitmap
    1611                 :             :  * iterator. Returns false when there is nothing more to scan.
    1612                 :             :  */
    1613                 :             : bool
    1614                 :       44484 : tbm_iterate(TBMIterator *iterator, TBMIterateResult *tbmres)
    1615                 :             : {
    1616         [ +  - ]:       44484 :         Assert(iterator);
    1617         [ +  - ]:       44484 :         Assert(tbmres);
    1618                 :             : 
    1619         [ +  + ]:       44484 :         if (iterator->shared)
    1620                 :        5075 :                 return tbm_shared_iterate(iterator->i.shared_iterator, tbmres);
    1621                 :             :         else
    1622                 :       39409 :                 return tbm_private_iterate(iterator->i.private_iterator, tbmres);
    1623                 :       44484 : }
        

Generated by: LCOV version 2.3.2-1