LCOV - code coverage report
Current view: top level - src/backend/storage/file - fd.c (source / functions) Coverage Total Hit
Test: Code coverage Lines: 66.4 % 1231 817
Test Date: 2026-01-26 10:56:24 Functions: 86.7 % 98 85
Legend: Lines:     hit not hit
Branches: + taken - not taken # not executed
Branches: 28.6 % 1047 299

             Branch data     Line data    Source code
       1                 :             : /*-------------------------------------------------------------------------
       2                 :             :  *
       3                 :             :  * fd.c
       4                 :             :  *        Virtual file descriptor code.
       5                 :             :  *
       6                 :             :  * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
       7                 :             :  * Portions Copyright (c) 1994, Regents of the University of California
       8                 :             :  *
       9                 :             :  * IDENTIFICATION
      10                 :             :  *        src/backend/storage/file/fd.c
      11                 :             :  *
      12                 :             :  * NOTES:
      13                 :             :  *
      14                 :             :  * This code manages a cache of 'virtual' file descriptors (VFDs).
      15                 :             :  * The server opens many file descriptors for a variety of reasons,
      16                 :             :  * including base tables, scratch files (e.g., sort and hash spool
      17                 :             :  * files), and random calls to C library routines like system(3); it
      18                 :             :  * is quite easy to exceed system limits on the number of open files a
      19                 :             :  * single process can have.  (This is around 1024 on many modern
      20                 :             :  * operating systems, but may be lower on others.)
      21                 :             :  *
      22                 :             :  * VFDs are managed as an LRU pool, with actual OS file descriptors
      23                 :             :  * being opened and closed as needed.  Obviously, if a routine is
      24                 :             :  * opened using these interfaces, all subsequent operations must also
      25                 :             :  * be through these interfaces (the File type is not a real file
      26                 :             :  * descriptor).
      27                 :             :  *
      28                 :             :  * For this scheme to work, most (if not all) routines throughout the
      29                 :             :  * server should use these interfaces instead of calling the C library
      30                 :             :  * routines (e.g., open(2) and fopen(3)) themselves.  Otherwise, we
      31                 :             :  * may find ourselves short of real file descriptors anyway.
      32                 :             :  *
      33                 :             :  * INTERFACE ROUTINES
      34                 :             :  *
      35                 :             :  * PathNameOpenFile and OpenTemporaryFile are used to open virtual files.
      36                 :             :  * A File opened with OpenTemporaryFile is automatically deleted when the
      37                 :             :  * File is closed, either explicitly or implicitly at end of transaction or
      38                 :             :  * process exit. PathNameOpenFile is intended for files that are held open
      39                 :             :  * for a long time, like relation files. It is the caller's responsibility
      40                 :             :  * to close them, there is no automatic mechanism in fd.c for that.
      41                 :             :  *
      42                 :             :  * PathName(Create|Open|Delete)Temporary(File|Dir) are used to manage
      43                 :             :  * temporary files that have names so that they can be shared between
      44                 :             :  * backends.  Such files are automatically closed and count against the
      45                 :             :  * temporary file limit of the backend that creates them, but unlike anonymous
      46                 :             :  * files they are not automatically deleted.  See sharedfileset.c for a shared
      47                 :             :  * ownership mechanism that provides automatic cleanup for shared files when
      48                 :             :  * the last of a group of backends detaches.
      49                 :             :  *
      50                 :             :  * AllocateFile, AllocateDir, OpenPipeStream and OpenTransientFile are
      51                 :             :  * wrappers around fopen(3), opendir(3), popen(3) and open(2), respectively.
      52                 :             :  * They behave like the corresponding native functions, except that the handle
      53                 :             :  * is registered with the current subtransaction, and will be automatically
      54                 :             :  * closed at abort. These are intended mainly for short operations like
      55                 :             :  * reading a configuration file; there is a limit on the number of files that
      56                 :             :  * can be opened using these functions at any one time.
      57                 :             :  *
      58                 :             :  * Finally, BasicOpenFile is just a thin wrapper around open() that can
      59                 :             :  * release file descriptors in use by the virtual file descriptors if
      60                 :             :  * necessary. There is no automatic cleanup of file descriptors returned by
      61                 :             :  * BasicOpenFile, it is solely the caller's responsibility to close the file
      62                 :             :  * descriptor by calling close(2).
      63                 :             :  *
      64                 :             :  * If a non-virtual file descriptor needs to be held open for any length of
      65                 :             :  * time, report it to fd.c by calling AcquireExternalFD or ReserveExternalFD
      66                 :             :  * (and eventually ReleaseExternalFD), so that we can take it into account
      67                 :             :  * while deciding how many VFDs can be open.  This applies to FDs obtained
      68                 :             :  * with BasicOpenFile as well as those obtained without use of any fd.c API.
      69                 :             :  *
      70                 :             :  *-------------------------------------------------------------------------
      71                 :             :  */
      72                 :             : 
      73                 :             : #include "postgres.h"
      74                 :             : 
      75                 :             : #include <dirent.h>
      76                 :             : #include <sys/file.h>
      77                 :             : #include <sys/param.h>
      78                 :             : #include <sys/resource.h>         /* for getrlimit */
      79                 :             : #include <sys/stat.h>
      80                 :             : #include <sys/types.h>
      81                 :             : #ifndef WIN32
      82                 :             : #include <sys/mman.h>
      83                 :             : #endif
      84                 :             : #include <limits.h>
      85                 :             : #include <unistd.h>
      86                 :             : #include <fcntl.h>
      87                 :             : 
      88                 :             : #include "access/xact.h"
      89                 :             : #include "access/xlog.h"
      90                 :             : #include "catalog/pg_tablespace.h"
      91                 :             : #include "common/file_perm.h"
      92                 :             : #include "common/file_utils.h"
      93                 :             : #include "common/pg_prng.h"
      94                 :             : #include "miscadmin.h"
      95                 :             : #include "pgstat.h"
      96                 :             : #include "postmaster/startup.h"
      97                 :             : #include "storage/aio.h"
      98                 :             : #include "storage/fd.h"
      99                 :             : #include "storage/ipc.h"
     100                 :             : #include "utils/guc.h"
     101                 :             : #include "utils/guc_hooks.h"
     102                 :             : #include "utils/resowner.h"
     103                 :             : #include "utils/varlena.h"
     104                 :             : 
     105                 :             : /* Define PG_FLUSH_DATA_WORKS if we have an implementation for pg_flush_data */
     106                 :             : #if defined(HAVE_SYNC_FILE_RANGE)
     107                 :             : #define PG_FLUSH_DATA_WORKS 1
     108                 :             : #elif !defined(WIN32) && defined(MS_ASYNC)
     109                 :             : #define PG_FLUSH_DATA_WORKS 1
     110                 :             : #elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
     111                 :             : #define PG_FLUSH_DATA_WORKS 1
     112                 :             : #endif
     113                 :             : 
     114                 :             : /*
     115                 :             :  * We must leave some file descriptors free for system(), the dynamic loader,
     116                 :             :  * and other code that tries to open files without consulting fd.c.  This
     117                 :             :  * is the number left free.  (While we try fairly hard to prevent EMFILE
     118                 :             :  * errors, there's never any guarantee that we won't get ENFILE due to
     119                 :             :  * other processes chewing up FDs.  So it's a bad idea to try to open files
     120                 :             :  * without consulting fd.c.  Nonetheless we cannot control all code.)
     121                 :             :  *
     122                 :             :  * Because this is just a fixed setting, we are effectively assuming that
     123                 :             :  * no such code will leave FDs open over the long term; otherwise the slop
     124                 :             :  * is likely to be insufficient.  Note in particular that we expect that
     125                 :             :  * loading a shared library does not result in any permanent increase in
     126                 :             :  * the number of open files.  (This appears to be true on most if not
     127                 :             :  * all platforms as of Feb 2004.)
     128                 :             :  */
     129                 :             : #define NUM_RESERVED_FDS                10
     130                 :             : 
     131                 :             : /*
     132                 :             :  * If we have fewer than this many usable FDs after allowing for the reserved
     133                 :             :  * ones, choke.  (This value is chosen to work with "ulimit -n 64", but not
     134                 :             :  * much less than that.  Note that this value ensures numExternalFDs can be
     135                 :             :  * at least 16; as of this writing, the contrib/postgres_fdw regression tests
     136                 :             :  * will not pass unless that can grow to at least 14.)
     137                 :             :  */
     138                 :             : #define FD_MINFREE                              48
     139                 :             : 
     140                 :             : /*
     141                 :             :  * A number of platforms allow individual processes to open many more files
     142                 :             :  * than they can really support when *many* processes do the same thing.
     143                 :             :  * This GUC parameter lets the DBA limit max_safe_fds to something less than
     144                 :             :  * what the postmaster's initial probe suggests will work.
     145                 :             :  */
     146                 :             : int                     max_files_per_process = 1000;
     147                 :             : 
     148                 :             : /*
     149                 :             :  * Maximum number of file descriptors to open for operations that fd.c knows
     150                 :             :  * about (VFDs, AllocateFile etc, or "external" FDs).  This is initialized
     151                 :             :  * to a conservative value, and remains that way indefinitely in bootstrap or
     152                 :             :  * standalone-backend cases.  In normal postmaster operation, the postmaster
     153                 :             :  * calls set_max_safe_fds() late in initialization to update the value, and
     154                 :             :  * that value is then inherited by forked subprocesses.
     155                 :             :  *
     156                 :             :  * Note: the value of max_files_per_process is taken into account while
     157                 :             :  * setting this variable, and so need not be tested separately.
     158                 :             :  */
     159                 :             : int                     max_safe_fds = FD_MINFREE;      /* default if not changed */
     160                 :             : 
     161                 :             : /* Whether it is safe to continue running after fsync() fails. */
     162                 :             : bool            data_sync_retry = false;
     163                 :             : 
     164                 :             : /* How SyncDataDirectory() should do its job. */
     165                 :             : int                     recovery_init_sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
     166                 :             : 
     167                 :             : /* Which kinds of files should be opened with PG_O_DIRECT. */
     168                 :             : int                     io_direct_flags;
     169                 :             : 
     170                 :             : /* Debugging.... */
     171                 :             : 
     172                 :             : #ifdef FDDEBUG
     173                 :             : #define DO_DB(A) \
     174                 :             :         do { \
     175                 :             :                 int                     _do_db_save_errno = errno; \
     176                 :             :                 A; \
     177                 :             :                 errno = _do_db_save_errno; \
     178                 :             :         } while (0)
     179                 :             : #else
     180                 :             : #define DO_DB(A) \
     181                 :             :         ((void) 0)
     182                 :             : #endif
     183                 :             : 
     184                 :             : #define VFD_CLOSED (-1)
     185                 :             : 
     186                 :             : #define FileIsValid(file) \
     187                 :             :         ((file) > 0 && (file) < (int) SizeVfdCache && VfdCache[file].fileName != NULL)
     188                 :             : 
     189                 :             : #define FileIsNotOpen(file) (VfdCache[file].fd == VFD_CLOSED)
     190                 :             : 
     191                 :             : /* these are the assigned bits in fdstate below: */
     192                 :             : #define FD_DELETE_AT_CLOSE      (1 << 0)  /* T = delete when closed */
     193                 :             : #define FD_CLOSE_AT_EOXACT      (1 << 1)  /* T = close at eoXact */
     194                 :             : #define FD_TEMP_FILE_LIMIT      (1 << 2)  /* T = respect temp_file_limit */
     195                 :             : 
     196                 :             : typedef struct vfd
     197                 :             : {
     198                 :             :         int                     fd;                             /* current FD, or VFD_CLOSED if none */
     199                 :             :         unsigned short fdstate;         /* bitflags for VFD's state */
     200                 :             :         ResourceOwner resowner;         /* owner, for automatic cleanup */
     201                 :             :         File            nextFree;               /* link to next free VFD, if in freelist */
     202                 :             :         File            lruMoreRecently;        /* doubly linked recency-of-use list */
     203                 :             :         File            lruLessRecently;
     204                 :             :         pgoff_t         fileSize;               /* current size of file (0 if not temporary) */
     205                 :             :         char       *fileName;           /* name of file, or NULL for unused VFD */
     206                 :             :         /* NB: fileName is malloc'd, and must be free'd when closing the VFD */
     207                 :             :         int                     fileFlags;              /* open(2) flags for (re)opening the file */
     208                 :             :         mode_t          fileMode;               /* mode to pass to open(2) */
     209                 :             : } Vfd;
     210                 :             : 
     211                 :             : /*
     212                 :             :  * Virtual File Descriptor array pointer and size.  This grows as
     213                 :             :  * needed.  'File' values are indexes into this array.
     214                 :             :  * Note that VfdCache[0] is not a usable VFD, just a list header.
     215                 :             :  */
     216                 :             : static Vfd *VfdCache;
     217                 :             : static Size SizeVfdCache = 0;
     218                 :             : 
     219                 :             : /*
     220                 :             :  * Number of file descriptors known to be in use by VFD entries.
     221                 :             :  */
     222                 :             : static int      nfile = 0;
     223                 :             : 
     224                 :             : /*
     225                 :             :  * Flag to tell whether it's worth scanning VfdCache looking for temp files
     226                 :             :  * to close
     227                 :             :  */
     228                 :             : static bool have_xact_temporary_files = false;
     229                 :             : 
     230                 :             : /*
     231                 :             :  * Tracks the total size of all temporary files.  Note: when temp_file_limit
     232                 :             :  * is being enforced, this cannot overflow since the limit cannot be more
     233                 :             :  * than INT_MAX kilobytes.  When not enforcing, it could theoretically
     234                 :             :  * overflow, but we don't care.
     235                 :             :  */
     236                 :             : static uint64 temporary_files_size = 0;
     237                 :             : 
     238                 :             : /* Temporary file access initialized and not yet shut down? */
     239                 :             : #ifdef USE_ASSERT_CHECKING
     240                 :             : static bool temporary_files_allowed = false;
     241                 :             : #endif
     242                 :             : 
     243                 :             : /*
     244                 :             :  * List of OS handles opened with AllocateFile, AllocateDir and
     245                 :             :  * OpenTransientFile.
     246                 :             :  */
     247                 :             : typedef enum
     248                 :             : {
     249                 :             :         AllocateDescFile,
     250                 :             :         AllocateDescPipe,
     251                 :             :         AllocateDescDir,
     252                 :             :         AllocateDescRawFD,
     253                 :             : } AllocateDescKind;
     254                 :             : 
     255                 :             : typedef struct
     256                 :             : {
     257                 :             :         AllocateDescKind kind;
     258                 :             :         SubTransactionId create_subid;
     259                 :             :         union
     260                 :             :         {
     261                 :             :                 FILE       *file;
     262                 :             :                 DIR                *dir;
     263                 :             :                 int                     fd;
     264                 :             :         }                       desc;
     265                 :             : } AllocateDesc;
     266                 :             : 
     267                 :             : static int      numAllocatedDescs = 0;
     268                 :             : static int      maxAllocatedDescs = 0;
     269                 :             : static AllocateDesc *allocatedDescs = NULL;
     270                 :             : 
     271                 :             : /*
     272                 :             :  * Number of open "external" FDs reported to Reserve/ReleaseExternalFD.
     273                 :             :  */
     274                 :             : static int      numExternalFDs = 0;
     275                 :             : 
     276                 :             : /*
     277                 :             :  * Number of temporary files opened during the current session;
     278                 :             :  * this is used in generation of tempfile names.
     279                 :             :  */
     280                 :             : static long tempFileCounter = 0;
     281                 :             : 
     282                 :             : /*
     283                 :             :  * Array of OIDs of temp tablespaces.  (Some entries may be InvalidOid,
     284                 :             :  * indicating that the current database's default tablespace should be used.)
     285                 :             :  * When numTempTableSpaces is -1, this has not been set in the current
     286                 :             :  * transaction.
     287                 :             :  */
     288                 :             : static Oid *tempTableSpaces = NULL;
     289                 :             : static int      numTempTableSpaces = -1;
     290                 :             : static int      nextTempTableSpace = 0;
     291                 :             : 
     292                 :             : 
     293                 :             : /*--------------------
     294                 :             :  *
     295                 :             :  * Private Routines
     296                 :             :  *
     297                 :             :  * Delete                  - delete a file from the Lru ring
     298                 :             :  * LruDelete       - remove a file from the Lru ring and close its FD
     299                 :             :  * Insert                  - put a file at the front of the Lru ring
     300                 :             :  * LruInsert       - put a file at the front of the Lru ring and open it
     301                 :             :  * ReleaseLruFile  - Release an fd by closing the last entry in the Lru ring
     302                 :             :  * ReleaseLruFiles - Release fd(s) until we're under the max_safe_fds limit
     303                 :             :  * AllocateVfd     - grab a free (or new) file record (from VfdCache)
     304                 :             :  * FreeVfd                 - free a file record
     305                 :             :  *
     306                 :             :  * The Least Recently Used ring is a doubly linked list that begins and
     307                 :             :  * ends on element zero.  Element zero is special -- it doesn't represent
     308                 :             :  * a file and its "fd" field always == VFD_CLOSED.  Element zero is just an
     309                 :             :  * anchor that shows us the beginning/end of the ring.
     310                 :             :  * Only VFD elements that are currently really open (have an FD assigned) are
     311                 :             :  * in the Lru ring.  Elements that are "virtually" open can be recognized
     312                 :             :  * by having a non-null fileName field.
     313                 :             :  *
     314                 :             :  * example:
     315                 :             :  *
     316                 :             :  *         /--less----\                            /---------\
     317                 :             :  *         v               \                      v                       \
     318                 :             :  *       #0 --more---> LeastRecentlyUsed --more-\ \
     319                 :             :  *        ^\                                                                    | |
     320                 :             :  *         \\less--> MostRecentlyUsedFile    <---/ |
     321                 :             :  *              \more---/                                        \--less--/
     322                 :             :  *
     323                 :             :  *--------------------
     324                 :             :  */
     325                 :             : static void Delete(File file);
     326                 :             : static void LruDelete(File file);
     327                 :             : static void Insert(File file);
     328                 :             : static int      LruInsert(File file);
     329                 :             : static bool ReleaseLruFile(void);
     330                 :             : static void ReleaseLruFiles(void);
     331                 :             : static File AllocateVfd(void);
     332                 :             : static void FreeVfd(File file);
     333                 :             : 
     334                 :             : static int      FileAccess(File file);
     335                 :             : static File OpenTemporaryFileInTablespace(Oid tblspcOid, bool rejectError);
     336                 :             : static bool reserveAllocatedDesc(void);
     337                 :             : static int      FreeDesc(AllocateDesc *desc);
     338                 :             : 
     339                 :             : static void BeforeShmemExit_Files(int code, Datum arg);
     340                 :             : static void CleanupTempFiles(bool isCommit, bool isProcExit);
     341                 :             : static void RemovePgTempRelationFiles(const char *tsdirname);
     342                 :             : static void RemovePgTempRelationFilesInDbspace(const char *dbspacedirname);
     343                 :             : 
     344                 :             : static void walkdir(const char *path,
     345                 :             :                                         void (*action) (const char *fname, bool isdir, int elevel),
     346                 :             :                                         bool process_symlinks,
     347                 :             :                                         int elevel);
     348                 :             : #ifdef PG_FLUSH_DATA_WORKS
     349                 :             : static void pre_sync_fname(const char *fname, bool isdir, int elevel);
     350                 :             : #endif
     351                 :             : static void datadir_fsync_fname(const char *fname, bool isdir, int elevel);
     352                 :             : static void unlink_if_exists_fname(const char *fname, bool isdir, int elevel);
     353                 :             : 
     354                 :             : static int      fsync_parent_path(const char *fname, int elevel);
     355                 :             : 
     356                 :             : 
     357                 :             : /* ResourceOwner callbacks to hold virtual file descriptors */
     358                 :             : static void ResOwnerReleaseFile(Datum res);
     359                 :             : static char *ResOwnerPrintFile(Datum res);
     360                 :             : 
     361                 :             : static const ResourceOwnerDesc file_resowner_desc =
     362                 :             : {
     363                 :             :         .name = "File",
     364                 :             :         .release_phase = RESOURCE_RELEASE_AFTER_LOCKS,
     365                 :             :         .release_priority = RELEASE_PRIO_FILES,
     366                 :             :         .ReleaseResource = ResOwnerReleaseFile,
     367                 :             :         .DebugPrint = ResOwnerPrintFile
     368                 :             : };
     369                 :             : 
     370                 :             : /* Convenience wrappers over ResourceOwnerRemember/Forget */
     371                 :             : static inline void
     372                 :        1366 : ResourceOwnerRememberFile(ResourceOwner owner, File file)
     373                 :             : {
     374                 :        1366 :         ResourceOwnerRemember(owner, Int32GetDatum(file), &file_resowner_desc);
     375                 :        1366 : }
     376                 :             : static inline void
     377                 :        1366 : ResourceOwnerForgetFile(ResourceOwner owner, File file)
     378                 :             : {
     379                 :        1366 :         ResourceOwnerForget(owner, Int32GetDatum(file), &file_resowner_desc);
     380                 :        1366 : }
     381                 :             : 
     382                 :             : /*
     383                 :             :  * pg_fsync --- do fsync with or without writethrough
     384                 :             :  */
     385                 :             : int
     386                 :         216 : pg_fsync(int fd)
     387                 :             : {
     388                 :             : #if !defined(WIN32) && defined(USE_ASSERT_CHECKING)
     389                 :         216 :         struct stat st;
     390                 :             : 
     391                 :             :         /*
     392                 :             :          * Some operating system implementations of fsync() have requirements
     393                 :             :          * about the file access modes that were used when their file descriptor
     394                 :             :          * argument was opened, and these requirements differ depending on whether
     395                 :             :          * the file descriptor is for a directory.
     396                 :             :          *
     397                 :             :          * For any file descriptor that may eventually be handed to fsync(), we
     398                 :             :          * should have opened it with access modes that are compatible with
     399                 :             :          * fsync() on all supported systems, otherwise the code may not be
     400                 :             :          * portable, even if it runs ok on the current system.
     401                 :             :          *
     402                 :             :          * We assert here that a descriptor for a file was opened with write
     403                 :             :          * permissions (i.e., not O_RDONLY) and for a directory without write
     404                 :             :          * permissions (O_RDONLY).  Notice that the assertion check is made even
     405                 :             :          * if fsync() is disabled.
     406                 :             :          *
     407                 :             :          * If fstat() fails, ignore it and let the follow-up fsync() complain.
     408                 :             :          */
     409         [ -  + ]:         216 :         if (fstat(fd, &st) == 0)
     410                 :             :         {
     411                 :         216 :                 int                     desc_flags = fcntl(fd, F_GETFL);
     412                 :             : 
     413                 :         216 :                 desc_flags &= O_ACCMODE;
     414                 :             : 
     415         [ +  + ]:         216 :                 if (S_ISDIR(st.st_mode))
     416         [ +  - ]:          73 :                         Assert(desc_flags == O_RDONLY);
     417                 :             :                 else
     418         [ +  - ]:         143 :                         Assert(desc_flags != O_RDONLY);
     419                 :         216 :         }
     420                 :         216 :         errno = 0;
     421                 :             : #endif
     422                 :             : 
     423                 :             :         /* #if is to skip the wal_sync_method test if there's no need for it */
     424                 :             : #if defined(HAVE_FSYNC_WRITETHROUGH)
     425         [ -  + ]:         216 :         if (wal_sync_method == WAL_SYNC_METHOD_FSYNC_WRITETHROUGH)
     426                 :           0 :                 return pg_fsync_writethrough(fd);
     427                 :             :         else
     428                 :             : #endif
     429                 :         216 :                 return pg_fsync_no_writethrough(fd);
     430                 :         216 : }
     431                 :             : 
     432                 :             : 
     433                 :             : /*
     434                 :             :  * pg_fsync_no_writethrough --- same as fsync except does nothing if
     435                 :             :  *      enableFsync is off
     436                 :             :  */
     437                 :             : int
     438                 :         216 : pg_fsync_no_writethrough(int fd)
     439                 :             : {
     440                 :         216 :         int                     rc;
     441                 :             : 
     442         [ -  + ]:         216 :         if (!enableFsync)
     443                 :         216 :                 return 0;
     444                 :             : 
     445                 :             : retry:
     446                 :           0 :         rc = fsync(fd);
     447                 :             : 
     448   [ #  #  #  # ]:           0 :         if (rc == -1 && errno == EINTR)
     449                 :           0 :                 goto retry;
     450                 :             : 
     451                 :           0 :         return rc;
     452                 :         216 : }
     453                 :             : 
     454                 :             : /*
     455                 :             :  * pg_fsync_writethrough
     456                 :             :  */
     457                 :             : int
     458                 :           0 : pg_fsync_writethrough(int fd)
     459                 :             : {
     460         [ #  # ]:           0 :         if (enableFsync)
     461                 :             :         {
     462                 :             : #if defined(F_FULLFSYNC)
     463                 :           0 :                 return (fcntl(fd, F_FULLFSYNC, 0) == -1) ? -1 : 0;
     464                 :             : #else
     465                 :             :                 errno = ENOSYS;
     466                 :             :                 return -1;
     467                 :             : #endif
     468                 :             :         }
     469                 :             :         else
     470                 :           0 :                 return 0;
     471                 :           0 : }
     472                 :             : 
     473                 :             : /*
     474                 :             :  * pg_fdatasync --- same as fdatasync except does nothing if enableFsync is off
     475                 :             :  */
     476                 :             : int
     477                 :           0 : pg_fdatasync(int fd)
     478                 :             : {
     479                 :           0 :         int                     rc;
     480                 :             : 
     481         [ #  # ]:           0 :         if (!enableFsync)
     482                 :           0 :                 return 0;
     483                 :             : 
     484                 :             : retry:
     485                 :           0 :         rc = fdatasync(fd);
     486                 :             : 
     487   [ #  #  #  # ]:           0 :         if (rc == -1 && errno == EINTR)
     488                 :           0 :                 goto retry;
     489                 :             : 
     490                 :           0 :         return rc;
     491                 :           0 : }
     492                 :             : 
     493                 :             : /*
     494                 :             :  * pg_file_exists -- check that a file exists.
     495                 :             :  *
     496                 :             :  * This requires an absolute path to the file.  Returns true if the file is
     497                 :             :  * not a directory, false otherwise.
     498                 :             :  */
     499                 :             : bool
     500                 :         813 : pg_file_exists(const char *name)
     501                 :             : {
     502                 :         813 :         struct stat st;
     503                 :             : 
     504         [ +  - ]:         813 :         Assert(name != NULL);
     505                 :             : 
     506         [ +  + ]:         813 :         if (stat(name, &st) == 0)
     507                 :         415 :                 return !S_ISDIR(st.st_mode);
     508   [ -  +  #  #  :         398 :         else if (!(errno == ENOENT || errno == ENOTDIR || errno == EACCES))
                   #  # ]
     509   [ #  #  #  # ]:           0 :                 ereport(ERROR,
     510                 :             :                                 (errcode_for_file_access(),
     511                 :             :                                  errmsg("could not access file \"%s\": %m", name)));
     512                 :             : 
     513                 :         398 :         return false;
     514                 :         813 : }
     515                 :             : 
     516                 :             : /*
     517                 :             :  * pg_flush_data --- advise OS that the described dirty data should be flushed
     518                 :             :  *
     519                 :             :  * offset of 0 with nbytes 0 means that the entire file should be flushed
     520                 :             :  */
     521                 :             : void
     522                 :         992 : pg_flush_data(int fd, pgoff_t offset, pgoff_t nbytes)
     523                 :             : {
     524                 :             :         /*
     525                 :             :          * Right now file flushing is primarily used to avoid making later
     526                 :             :          * fsync()/fdatasync() calls have less impact. Thus don't trigger flushes
     527                 :             :          * if fsyncs are disabled - that's a decision we might want to make
     528                 :             :          * configurable at some point.
     529                 :             :          */
     530         [ -  + ]:         992 :         if (!enableFsync)
     531                 :         992 :                 return;
     532                 :             : 
     533                 :             :         /*
     534                 :             :          * We compile all alternatives that are supported on the current platform,
     535                 :             :          * to find portability problems more easily.
     536                 :             :          */
     537                 :             : #if defined(HAVE_SYNC_FILE_RANGE)
     538                 :             :         {
     539                 :             :                 int                     rc;
     540                 :             :                 static bool not_implemented_by_kernel = false;
     541                 :             : 
     542                 :             :                 if (not_implemented_by_kernel)
     543                 :             :                         return;
     544                 :             : 
     545                 :             : retry:
     546                 :             : 
     547                 :             :                 /*
     548                 :             :                  * sync_file_range(SYNC_FILE_RANGE_WRITE), currently linux specific,
     549                 :             :                  * tells the OS that writeback for the specified blocks should be
     550                 :             :                  * started, but that we don't want to wait for completion.  Note that
     551                 :             :                  * this call might block if too much dirty data exists in the range.
     552                 :             :                  * This is the preferable method on OSs supporting it, as it works
     553                 :             :                  * reliably when available (contrast to msync()) and doesn't flush out
     554                 :             :                  * clean data (like FADV_DONTNEED).
     555                 :             :                  */
     556                 :             :                 rc = sync_file_range(fd, offset, nbytes,
     557                 :             :                                                          SYNC_FILE_RANGE_WRITE);
     558                 :             :                 if (rc != 0)
     559                 :             :                 {
     560                 :             :                         int                     elevel;
     561                 :             : 
     562                 :             :                         if (rc == EINTR)
     563                 :             :                                 goto retry;
     564                 :             : 
     565                 :             :                         /*
     566                 :             :                          * For systems that don't have an implementation of
     567                 :             :                          * sync_file_range() such as Windows WSL, generate only one
     568                 :             :                          * warning and then suppress all further attempts by this process.
     569                 :             :                          */
     570                 :             :                         if (errno == ENOSYS)
     571                 :             :                         {
     572                 :             :                                 elevel = WARNING;
     573                 :             :                                 not_implemented_by_kernel = true;
     574                 :             :                         }
     575                 :             :                         else
     576                 :             :                                 elevel = data_sync_elevel(WARNING);
     577                 :             : 
     578                 :             :                         ereport(elevel,
     579                 :             :                                         (errcode_for_file_access(),
     580                 :             :                                          errmsg("could not flush dirty data: %m")));
     581                 :             :                 }
     582                 :             : 
     583                 :             :                 return;
     584                 :             :         }
     585                 :             : #endif
     586                 :             : #if !defined(WIN32) && defined(MS_ASYNC)
     587                 :             :         {
     588                 :           0 :                 void       *p;
     589                 :             :                 static int      pagesize = 0;
     590                 :             : 
     591                 :             :                 /*
     592                 :             :                  * On several OSs msync(MS_ASYNC) on a mmap'ed file triggers
     593                 :             :                  * writeback. On linux it only does so if MS_SYNC is specified, but
     594                 :             :                  * then it does the writeback synchronously. Luckily all common linux
     595                 :             :                  * systems have sync_file_range().  This is preferable over
     596                 :             :                  * FADV_DONTNEED because it doesn't flush out clean data.
     597                 :             :                  *
     598                 :             :                  * We map the file (mmap()), tell the kernel to sync back the contents
     599                 :             :                  * (msync()), and then remove the mapping again (munmap()).
     600                 :             :                  */
     601                 :             : 
     602                 :             :                 /* mmap() needs actual length if we want to map whole file */
     603   [ #  #  #  # ]:           0 :                 if (offset == 0 && nbytes == 0)
     604                 :             :                 {
     605                 :           0 :                         nbytes = lseek(fd, 0, SEEK_END);
     606         [ #  # ]:           0 :                         if (nbytes < 0)
     607                 :             :                         {
     608   [ #  #  #  # ]:           0 :                                 ereport(WARNING,
     609                 :             :                                                 (errcode_for_file_access(),
     610                 :             :                                                  errmsg("could not determine dirty data size: %m")));
     611                 :           0 :                                 return;
     612                 :             :                         }
     613                 :           0 :                 }
     614                 :             : 
     615                 :             :                 /*
     616                 :             :                  * Some platforms reject partial-page mmap() attempts.  To deal with
     617                 :             :                  * that, just truncate the request to a page boundary.  If any extra
     618                 :             :                  * bytes don't get flushed, well, it's only a hint anyway.
     619                 :             :                  */
     620                 :             : 
     621                 :             :                 /* fetch pagesize only once */
     622         [ #  # ]:           0 :                 if (pagesize == 0)
     623                 :           0 :                         pagesize = sysconf(_SC_PAGESIZE);
     624                 :             : 
     625                 :             :                 /* align length to pagesize, dropping any fractional page */
     626         [ #  # ]:           0 :                 if (pagesize > 0)
     627                 :           0 :                         nbytes = (nbytes / pagesize) * pagesize;
     628                 :             : 
     629                 :             :                 /* fractional-page request is a no-op */
     630         [ #  # ]:           0 :                 if (nbytes <= 0)
     631                 :           0 :                         return;
     632                 :             : 
     633                 :             :                 /*
     634                 :             :                  * mmap could well fail, particularly on 32-bit platforms where there
     635                 :             :                  * may simply not be enough address space.  If so, silently fall
     636                 :             :                  * through to the next implementation.
     637                 :             :                  */
     638         [ #  # ]:           0 :                 if (nbytes <= (pgoff_t) SSIZE_MAX)
     639                 :           0 :                         p = mmap(NULL, nbytes, PROT_READ, MAP_SHARED, fd, offset);
     640                 :             :                 else
     641                 :           0 :                         p = MAP_FAILED;
     642                 :             : 
     643         [ #  # ]:           0 :                 if (p != MAP_FAILED)
     644                 :             :                 {
     645                 :           0 :                         int                     rc;
     646                 :             : 
     647                 :           0 :                         rc = msync(p, (size_t) nbytes, MS_ASYNC);
     648         [ #  # ]:           0 :                         if (rc != 0)
     649                 :             :                         {
     650   [ #  #  #  #  :           0 :                                 ereport(data_sync_elevel(WARNING),
                   #  # ]
     651                 :             :                                                 (errcode_for_file_access(),
     652                 :             :                                                  errmsg("could not flush dirty data: %m")));
     653                 :             :                                 /* NB: need to fall through to munmap()! */
     654                 :           0 :                         }
     655                 :             : 
     656                 :           0 :                         rc = munmap(p, (size_t) nbytes);
     657         [ #  # ]:           0 :                         if (rc != 0)
     658                 :             :                         {
     659                 :             :                                 /* FATAL error because mapping would remain */
     660   [ #  #  #  # ]:           0 :                                 ereport(FATAL,
     661                 :             :                                                 (errcode_for_file_access(),
     662                 :             :                                                  errmsg("could not munmap() while flushing data: %m")));
     663                 :           0 :                         }
     664                 :             : 
     665                 :             :                         return;
     666                 :           0 :                 }
     667         [ #  # ]:           0 :         }
     668                 :             : #endif
     669                 :             : #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
     670                 :             :         {
     671                 :             :                 int                     rc;
     672                 :             : 
     673                 :             :                 /*
     674                 :             :                  * Signal the kernel that the passed in range should not be cached
     675                 :             :                  * anymore. This has the, desired, side effect of writing out dirty
     676                 :             :                  * data, and the, undesired, side effect of likely discarding useful
     677                 :             :                  * clean cached blocks.  For the latter reason this is the least
     678                 :             :                  * preferable method.
     679                 :             :                  */
     680                 :             : 
     681                 :             :                 rc = posix_fadvise(fd, offset, nbytes, POSIX_FADV_DONTNEED);
     682                 :             : 
     683                 :             :                 if (rc != 0)
     684                 :             :                 {
     685                 :             :                         /* don't error out, this is just a performance optimization */
     686                 :             :                         ereport(WARNING,
     687                 :             :                                         (errcode_for_file_access(),
     688                 :             :                                          errmsg("could not flush dirty data: %m")));
     689                 :             :                 }
     690                 :             : 
     691                 :             :                 return;
     692                 :             :         }
     693                 :             : #endif
     694                 :         992 : }
     695                 :             : 
     696                 :             : /*
     697                 :             :  * Truncate an open file to a given length.
     698                 :             :  */
     699                 :             : static int
     700                 :         102 : pg_ftruncate(int fd, pgoff_t length)
     701                 :             : {
     702                 :         102 :         int                     ret;
     703                 :             : 
     704                 :             : retry:
     705                 :         102 :         ret = ftruncate(fd, length);
     706                 :             : 
     707   [ -  +  #  # ]:         102 :         if (ret == -1 && errno == EINTR)
     708                 :           0 :                 goto retry;
     709                 :             : 
     710                 :         204 :         return ret;
     711                 :         102 : }
     712                 :             : 
     713                 :             : /*
     714                 :             :  * Truncate a file to a given length by name.
     715                 :             :  */
     716                 :             : int
     717                 :       39870 : pg_truncate(const char *path, pgoff_t length)
     718                 :             : {
     719                 :       39870 :         int                     ret;
     720                 :             : #ifdef WIN32
     721                 :             :         int                     save_errno;
     722                 :             :         int                     fd;
     723                 :             : 
     724                 :             :         fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
     725                 :             :         if (fd >= 0)
     726                 :             :         {
     727                 :             :                 ret = pg_ftruncate(fd, length);
     728                 :             :                 save_errno = errno;
     729                 :             :                 CloseTransientFile(fd);
     730                 :             :                 errno = save_errno;
     731                 :             :         }
     732                 :             :         else
     733                 :             :                 ret = -1;
     734                 :             : #else
     735                 :             : 
     736                 :             : retry:
     737                 :       39870 :         ret = truncate(path, length);
     738                 :             : 
     739   [ +  +  +  - ]:       39870 :         if (ret == -1 && errno == EINTR)
     740                 :           0 :                 goto retry;
     741                 :             : #endif
     742                 :             : 
     743                 :       79740 :         return ret;
     744                 :       39870 : }
     745                 :             : 
     746                 :             : /*
     747                 :             :  * fsync_fname -- fsync a file or directory, handling errors properly
     748                 :             :  *
     749                 :             :  * Try to fsync a file or directory. When doing the latter, ignore errors that
     750                 :             :  * indicate the OS just doesn't allow/require fsyncing directories.
     751                 :             :  */
     752                 :             : void
     753                 :          39 : fsync_fname(const char *fname, bool isdir)
     754                 :             : {
     755                 :          39 :         fsync_fname_ext(fname, isdir, false, data_sync_elevel(ERROR));
     756                 :          39 : }
     757                 :             : 
     758                 :             : /*
     759                 :             :  * durable_rename -- rename(2) wrapper, issuing fsyncs required for durability
     760                 :             :  *
     761                 :             :  * This routine ensures that, after returning, the effect of renaming file
     762                 :             :  * persists in case of a crash. A crash while this routine is running will
     763                 :             :  * leave you with either the pre-existing or the moved file in place of the
     764                 :             :  * new file; no mixed state or truncated files are possible.
     765                 :             :  *
     766                 :             :  * It does so by using fsync on the old filename and the possibly existing
     767                 :             :  * target filename before the rename, and the target file and directory after.
     768                 :             :  *
     769                 :             :  * Note that rename() cannot be used across arbitrary directories, as they
     770                 :             :  * might not be on the same filesystem. Therefore this routine does not
     771                 :             :  * support renaming across directories.
     772                 :             :  *
     773                 :             :  * Log errors with the caller specified severity.
     774                 :             :  *
     775                 :             :  * Returns 0 if the operation succeeded, -1 otherwise. Note that errno is not
     776                 :             :  * valid upon return.
     777                 :             :  */
     778                 :             : int
     779                 :          34 : durable_rename(const char *oldfile, const char *newfile, int elevel)
     780                 :             : {
     781                 :          34 :         int                     fd;
     782                 :             : 
     783                 :             :         /*
     784                 :             :          * First fsync the old and target path (if it exists), to ensure that they
     785                 :             :          * are properly persistent on disk. Syncing the target file is not
     786                 :             :          * strictly necessary, but it makes it easier to reason about crashes;
     787                 :             :          * because it's then guaranteed that either source or target file exists
     788                 :             :          * after a crash.
     789                 :             :          */
     790         [ -  + ]:          34 :         if (fsync_fname_ext(oldfile, false, false, elevel) != 0)
     791                 :           0 :                 return -1;
     792                 :             : 
     793                 :          34 :         fd = OpenTransientFile(newfile, PG_BINARY | O_RDWR);
     794         [ +  + ]:          34 :         if (fd < 0)
     795                 :             :         {
     796         [ -  + ]:          26 :                 if (errno != ENOENT)
     797                 :             :                 {
     798   [ #  #  #  #  :           0 :                         ereport(elevel,
          #  #  #  #  #  
                      # ]
     799                 :             :                                         (errcode_for_file_access(),
     800                 :             :                                          errmsg("could not open file \"%s\": %m", newfile)));
     801                 :           0 :                         return -1;
     802                 :             :                 }
     803                 :          26 :         }
     804                 :             :         else
     805                 :             :         {
     806         [ -  + ]:           8 :                 if (pg_fsync(fd) != 0)
     807                 :             :                 {
     808                 :           0 :                         int                     save_errno;
     809                 :             : 
     810                 :             :                         /* close file upon error, might not be in transaction context */
     811                 :           0 :                         save_errno = errno;
     812                 :           0 :                         CloseTransientFile(fd);
     813                 :           0 :                         errno = save_errno;
     814                 :             : 
     815   [ #  #  #  #  :           0 :                         ereport(elevel,
          #  #  #  #  #  
                      # ]
     816                 :             :                                         (errcode_for_file_access(),
     817                 :             :                                          errmsg("could not fsync file \"%s\": %m", newfile)));
     818                 :           0 :                         return -1;
     819                 :           0 :                 }
     820                 :             : 
     821         [ -  + ]:           8 :                 if (CloseTransientFile(fd) != 0)
     822                 :             :                 {
     823   [ #  #  #  #  :           0 :                         ereport(elevel,
          #  #  #  #  #  
                      # ]
     824                 :             :                                         (errcode_for_file_access(),
     825                 :             :                                          errmsg("could not close file \"%s\": %m", newfile)));
     826                 :           0 :                         return -1;
     827                 :             :                 }
     828                 :             :         }
     829                 :             : 
     830                 :             :         /* Time to do the real deal... */
     831         [ +  - ]:          34 :         if (rename(oldfile, newfile) < 0)
     832                 :             :         {
     833   [ #  #  #  #  :           0 :                 ereport(elevel,
          #  #  #  #  #  
                      # ]
     834                 :             :                                 (errcode_for_file_access(),
     835                 :             :                                  errmsg("could not rename file \"%s\" to \"%s\": %m",
     836                 :             :                                                 oldfile, newfile)));
     837                 :           0 :                 return -1;
     838                 :             :         }
     839                 :             : 
     840                 :             :         /*
     841                 :             :          * To guarantee renaming the file is persistent, fsync the file with its
     842                 :             :          * new name, and its containing directory.
     843                 :             :          */
     844         [ -  + ]:          34 :         if (fsync_fname_ext(newfile, false, false, elevel) != 0)
     845                 :           0 :                 return -1;
     846                 :             : 
     847         [ -  + ]:          34 :         if (fsync_parent_path(newfile, elevel) != 0)
     848                 :           0 :                 return -1;
     849                 :             : 
     850                 :          34 :         return 0;
     851                 :          34 : }
     852                 :             : 
     853                 :             : /*
     854                 :             :  * durable_unlink -- remove a file in a durable manner
     855                 :             :  *
     856                 :             :  * This routine ensures that, after returning, the effect of removing file
     857                 :             :  * persists in case of a crash. A crash while this routine is running will
     858                 :             :  * leave the system in no mixed state.
     859                 :             :  *
     860                 :             :  * It does so by using fsync on the parent directory of the file after the
     861                 :             :  * actual removal is done.
     862                 :             :  *
     863                 :             :  * Log errors with the severity specified by caller.
     864                 :             :  *
     865                 :             :  * Returns 0 if the operation succeeded, -1 otherwise. Note that errno is not
     866                 :             :  * valid upon return.
     867                 :             :  */
     868                 :             : int
     869                 :           0 : durable_unlink(const char *fname, int elevel)
     870                 :             : {
     871         [ #  # ]:           0 :         if (unlink(fname) < 0)
     872                 :             :         {
     873   [ #  #  #  #  :           0 :                 ereport(elevel,
          #  #  #  #  #  
                      # ]
     874                 :             :                                 (errcode_for_file_access(),
     875                 :             :                                  errmsg("could not remove file \"%s\": %m",
     876                 :             :                                                 fname)));
     877                 :           0 :                 return -1;
     878                 :             :         }
     879                 :             : 
     880                 :             :         /*
     881                 :             :          * To guarantee that the removal of the file is persistent, fsync its
     882                 :             :          * parent directory.
     883                 :             :          */
     884         [ #  # ]:           0 :         if (fsync_parent_path(fname, elevel) != 0)
     885                 :           0 :                 return -1;
     886                 :             : 
     887                 :           0 :         return 0;
     888                 :           0 : }
     889                 :             : 
     890                 :             : /*
     891                 :             :  * InitFileAccess --- initialize this module during backend startup
     892                 :             :  *
     893                 :             :  * This is called during either normal or standalone backend start.
     894                 :             :  * It is *not* called in the postmaster.
     895                 :             :  *
     896                 :             :  * Note that this does not initialize temporary file access, that is
     897                 :             :  * separately initialized via InitTemporaryFileAccess().
     898                 :             :  */
     899                 :             : void
     900                 :         806 : InitFileAccess(void)
     901                 :             : {
     902         [ +  - ]:         806 :         Assert(SizeVfdCache == 0);      /* call me only once */
     903                 :             : 
     904                 :             :         /* initialize cache header entry */
     905                 :         806 :         VfdCache = (Vfd *) malloc(sizeof(Vfd));
     906         [ +  - ]:         806 :         if (VfdCache == NULL)
     907   [ #  #  #  # ]:           0 :                 ereport(FATAL,
     908                 :             :                                 (errcode(ERRCODE_OUT_OF_MEMORY),
     909                 :             :                                  errmsg("out of memory")));
     910                 :             : 
     911   [ +  -  +  -  :        6448 :         MemSet(&(VfdCache[0]), 0, sizeof(Vfd));
          +  -  -  +  +  
                      + ]
     912                 :         806 :         VfdCache->fd = VFD_CLOSED;
     913                 :             : 
     914                 :         806 :         SizeVfdCache = 1;
     915                 :         806 : }
     916                 :             : 
     917                 :             : /*
     918                 :             :  * InitTemporaryFileAccess --- initialize temporary file access during startup
     919                 :             :  *
     920                 :             :  * This is called during either normal or standalone backend start.
     921                 :             :  * It is *not* called in the postmaster.
     922                 :             :  *
     923                 :             :  * This is separate from InitFileAccess() because temporary file cleanup can
     924                 :             :  * cause pgstat reporting. As pgstat is shut down during before_shmem_exit(),
     925                 :             :  * our reporting has to happen before that. Low level file access should be
     926                 :             :  * available for longer, hence the separate initialization / shutdown of
     927                 :             :  * temporary file handling.
     928                 :             :  */
     929                 :             : void
     930                 :         806 : InitTemporaryFileAccess(void)
     931                 :             : {
     932         [ +  - ]:         806 :         Assert(SizeVfdCache != 0);      /* InitFileAccess() needs to have run */
     933         [ +  - ]:         806 :         Assert(!temporary_files_allowed);       /* call me only once */
     934                 :             : 
     935                 :             :         /*
     936                 :             :          * Register before-shmem-exit hook to ensure temp files are dropped while
     937                 :             :          * we can still report stats.
     938                 :             :          */
     939                 :         806 :         before_shmem_exit(BeforeShmemExit_Files, 0);
     940                 :             : 
     941                 :             : #ifdef USE_ASSERT_CHECKING
     942                 :         806 :         temporary_files_allowed = true;
     943                 :             : #endif
     944                 :         806 : }
     945                 :             : 
     946                 :             : /*
     947                 :             :  * count_usable_fds --- count how many FDs the system will let us open,
     948                 :             :  *              and estimate how many are already open.
     949                 :             :  *
     950                 :             :  * We stop counting if usable_fds reaches max_to_probe.  Note: a small
     951                 :             :  * value of max_to_probe might result in an underestimate of already_open;
     952                 :             :  * we must fill in any "gaps" in the set of used FDs before the calculation
     953                 :             :  * of already_open will give the right answer.  In practice, max_to_probe
     954                 :             :  * of a couple of dozen should be enough to ensure good results.
     955                 :             :  *
     956                 :             :  * We assume stderr (FD 2) is available for dup'ing.  While the calling
     957                 :             :  * script could theoretically close that, it would be a really bad idea,
     958                 :             :  * since then one risks loss of error messages from, e.g., libc.
     959                 :             :  */
     960                 :             : static void
     961                 :           6 : count_usable_fds(int max_to_probe, int *usable_fds, int *already_open)
     962                 :             : {
     963                 :           6 :         int                *fd;
     964                 :           6 :         int                     size;
     965                 :           6 :         int                     used = 0;
     966                 :           6 :         int                     highestfd = 0;
     967                 :           6 :         int                     j;
     968                 :             : 
     969                 :             : #ifdef HAVE_GETRLIMIT
     970                 :           6 :         struct rlimit rlim;
     971                 :           6 :         int                     getrlimit_status;
     972                 :             : #endif
     973                 :             : 
     974                 :           6 :         size = 1024;
     975                 :           6 :         fd = (int *) palloc(size * sizeof(int));
     976                 :             : 
     977                 :             : #ifdef HAVE_GETRLIMIT
     978                 :           6 :         getrlimit_status = getrlimit(RLIMIT_NOFILE, &rlim);
     979         [ +  - ]:           6 :         if (getrlimit_status != 0)
     980   [ #  #  #  # ]:           0 :                 ereport(WARNING, (errmsg("getrlimit failed: %m")));
     981                 :             : #endif                                                  /* HAVE_GETRLIMIT */
     982                 :             : 
     983                 :             :         /* dup until failure or probe limit reached */
     984                 :        1516 :         for (;;)
     985                 :             :         {
     986                 :        1516 :                 int                     thisfd;
     987                 :             : 
     988                 :             : #ifdef HAVE_GETRLIMIT
     989                 :             : 
     990                 :             :                 /*
     991                 :             :                  * don't go beyond RLIMIT_NOFILE; causes irritating kernel logs on
     992                 :             :                  * some platforms
     993                 :             :                  */
     994   [ +  -  +  + ]:        1516 :                 if (getrlimit_status == 0 && highestfd >= rlim.rlim_cur - 1)
     995                 :           6 :                         break;
     996                 :             : #endif
     997                 :             : 
     998                 :        1510 :                 thisfd = dup(2);
     999         [ +  - ]:        1510 :                 if (thisfd < 0)
    1000                 :             :                 {
    1001                 :             :                         /* Expect EMFILE or ENFILE, else it's fishy */
    1002   [ #  #  #  # ]:           0 :                         if (errno != EMFILE && errno != ENFILE)
    1003   [ #  #  #  # ]:           0 :                                 elog(WARNING, "duplicating stderr file descriptor failed after %d successes: %m", used);
    1004                 :           0 :                         break;
    1005                 :             :                 }
    1006                 :             : 
    1007         [ +  - ]:        1510 :                 if (used >= size)
    1008                 :             :                 {
    1009                 :           0 :                         size *= 2;
    1010                 :           0 :                         fd = (int *) repalloc(fd, size * sizeof(int));
    1011                 :           0 :                 }
    1012                 :        1510 :                 fd[used++] = thisfd;
    1013                 :             : 
    1014         [ -  + ]:        1510 :                 if (highestfd < thisfd)
    1015                 :        1510 :                         highestfd = thisfd;
    1016                 :             : 
    1017         [ -  + ]:        1510 :                 if (used >= max_to_probe)
    1018                 :           0 :                         break;
    1019      [ -  +  + ]:        1516 :         }
    1020                 :             : 
    1021                 :             :         /* release the files we opened */
    1022         [ +  + ]:        1516 :         for (j = 0; j < used; j++)
    1023                 :        1510 :                 close(fd[j]);
    1024                 :             : 
    1025                 :           6 :         pfree(fd);
    1026                 :             : 
    1027                 :             :         /*
    1028                 :             :          * Return results.  usable_fds is just the number of successful dups. We
    1029                 :             :          * assume that the system limit is highestfd+1 (remember 0 is a legal FD
    1030                 :             :          * number) and so already_open is highestfd+1 - usable_fds.
    1031                 :             :          */
    1032                 :           6 :         *usable_fds = used;
    1033                 :           6 :         *already_open = highestfd + 1 - used;
    1034                 :           6 : }
    1035                 :             : 
    1036                 :             : /*
    1037                 :             :  * set_max_safe_fds
    1038                 :             :  *              Determine number of file descriptors that fd.c is allowed to use
    1039                 :             :  */
    1040                 :             : void
    1041                 :           6 : set_max_safe_fds(void)
    1042                 :             : {
    1043                 :           6 :         int                     usable_fds;
    1044                 :           6 :         int                     already_open;
    1045                 :             : 
    1046                 :             :         /*----------
    1047                 :             :          * We want to set max_safe_fds to
    1048                 :             :          *                      MIN(usable_fds, max_files_per_process)
    1049                 :             :          * less the slop factor for files that are opened without consulting
    1050                 :             :          * fd.c.  This ensures that we won't allow to open more than
    1051                 :             :          * max_files_per_process, or the experimentally-determined EMFILE limit,
    1052                 :             :          * additional files.
    1053                 :             :          *----------
    1054                 :             :          */
    1055                 :           6 :         count_usable_fds(max_files_per_process,
    1056                 :             :                                          &usable_fds, &already_open);
    1057                 :             : 
    1058         [ +  - ]:           6 :         max_safe_fds = Min(usable_fds, max_files_per_process);
    1059                 :             : 
    1060                 :             :         /*
    1061                 :             :          * Take off the FDs reserved for system() etc.
    1062                 :             :          */
    1063                 :           6 :         max_safe_fds -= NUM_RESERVED_FDS;
    1064                 :             : 
    1065                 :             :         /*
    1066                 :             :          * Make sure we still have enough to get by.
    1067                 :             :          */
    1068         [ +  - ]:           6 :         if (max_safe_fds < FD_MINFREE)
    1069   [ #  #  #  # ]:           0 :                 ereport(FATAL,
    1070                 :             :                                 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
    1071                 :             :                                  errmsg("insufficient file descriptors available to start server process"),
    1072                 :             :                                  errdetail("System allows %d, server needs at least %d, %d files are already open.",
    1073                 :             :                                                    max_safe_fds + NUM_RESERVED_FDS,
    1074                 :             :                                                    FD_MINFREE + NUM_RESERVED_FDS,
    1075                 :             :                                                    already_open)));
    1076                 :             : 
    1077   [ -  +  -  + ]:           6 :         elog(DEBUG2, "max_safe_fds = %d, usable_fds = %d, already_open = %d",
    1078                 :             :                  max_safe_fds, usable_fds, already_open);
    1079                 :           6 : }
    1080                 :             : 
    1081                 :             : /*
    1082                 :             :  * Open a file with BasicOpenFilePerm() and pass default file mode for the
    1083                 :             :  * fileMode parameter.
    1084                 :             :  */
    1085                 :             : int
    1086                 :         435 : BasicOpenFile(const char *fileName, int fileFlags)
    1087                 :             : {
    1088                 :         435 :         return BasicOpenFilePerm(fileName, fileFlags, pg_file_create_mode);
    1089                 :             : }
    1090                 :             : 
    1091                 :             : /*
    1092                 :             :  * BasicOpenFilePerm --- same as open(2) except can free other FDs if needed
    1093                 :             :  *
    1094                 :             :  * This is exported for use by places that really want a plain kernel FD,
    1095                 :             :  * but need to be proof against running out of FDs.  Once an FD has been
    1096                 :             :  * successfully returned, it is the caller's responsibility to ensure that
    1097                 :             :  * it will not be leaked on ereport()!  Most users should *not* call this
    1098                 :             :  * routine directly, but instead use the VFD abstraction level, which
    1099                 :             :  * provides protection against descriptor leaks as well as management of
    1100                 :             :  * files that need to be open for more than a short period of time.
    1101                 :             :  *
    1102                 :             :  * Ideally this should be the *only* direct call of open() in the backend.
    1103                 :             :  * In practice, the postmaster calls open() directly, and there are some
    1104                 :             :  * direct open() calls done early in backend startup.  Those are OK since
    1105                 :             :  * this module wouldn't have any open files to close at that point anyway.
    1106                 :             :  */
    1107                 :             : int
    1108                 :      107728 : BasicOpenFilePerm(const char *fileName, int fileFlags, mode_t fileMode)
    1109                 :             : {
    1110                 :      107728 :         int                     fd;
    1111                 :             : 
    1112                 :             : tryAgain:
    1113                 :             : #ifdef PG_O_DIRECT_USE_F_NOCACHE
    1114                 :      107728 :         fd = open(fileName, fileFlags & ~PG_O_DIRECT, fileMode);
    1115                 :             : #else
    1116                 :             :         fd = open(fileName, fileFlags, fileMode);
    1117                 :             : #endif
    1118                 :             : 
    1119         [ +  + ]:      107728 :         if (fd >= 0)
    1120                 :             :         {
    1121                 :             : #ifdef PG_O_DIRECT_USE_F_NOCACHE
    1122         [ +  - ]:       61320 :                 if (fileFlags & PG_O_DIRECT)
    1123                 :             :                 {
    1124         [ #  # ]:           0 :                         if (fcntl(fd, F_NOCACHE, 1) < 0)
    1125                 :             :                         {
    1126                 :           0 :                                 int                     save_errno = errno;
    1127                 :             : 
    1128                 :           0 :                                 close(fd);
    1129                 :           0 :                                 errno = save_errno;
    1130                 :           0 :                                 return -1;
    1131                 :           0 :                         }
    1132                 :           0 :                 }
    1133                 :             : #endif
    1134                 :             : 
    1135                 :       61320 :                 return fd;                              /* success! */
    1136                 :             :         }
    1137                 :             : 
    1138   [ +  -  -  + ]:       46408 :         if (errno == EMFILE || errno == ENFILE)
    1139                 :             :         {
    1140                 :           0 :                 int                     save_errno = errno;
    1141                 :             : 
    1142   [ #  #  #  # ]:           0 :                 ereport(LOG,
    1143                 :             :                                 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
    1144                 :             :                                  errmsg("out of file descriptors: %m; release and retry")));
    1145                 :           0 :                 errno = 0;
    1146         [ #  # ]:           0 :                 if (ReleaseLruFile())
    1147                 :           0 :                         goto tryAgain;
    1148                 :           0 :                 errno = save_errno;
    1149      [ #  #  # ]:           0 :         }
    1150                 :             : 
    1151                 :       46408 :         return -1;                                      /* failure */
    1152                 :      107728 : }
    1153                 :             : 
    1154                 :             : /*
    1155                 :             :  * AcquireExternalFD - attempt to reserve an external file descriptor
    1156                 :             :  *
    1157                 :             :  * This should be used by callers that need to hold a file descriptor open
    1158                 :             :  * over more than a short interval, but cannot use any of the other facilities
    1159                 :             :  * provided by this module.
    1160                 :             :  *
    1161                 :             :  * The difference between this and the underlying ReserveExternalFD function
    1162                 :             :  * is that this will report failure (by setting errno and returning false)
    1163                 :             :  * if "too many" external FDs are already reserved.  This should be used in
    1164                 :             :  * any code where the total number of FDs to be reserved is not predictable
    1165                 :             :  * and small.
    1166                 :             :  */
    1167                 :             : bool
    1168                 :        1128 : AcquireExternalFD(void)
    1169                 :             : {
    1170                 :             :         /*
    1171                 :             :          * We don't want more than max_safe_fds / 3 FDs to be consumed for
    1172                 :             :          * "external" FDs.
    1173                 :             :          */
    1174         [ +  - ]:        1128 :         if (numExternalFDs < max_safe_fds / 3)
    1175                 :             :         {
    1176                 :        1128 :                 ReserveExternalFD();
    1177                 :        1128 :                 return true;
    1178                 :             :         }
    1179                 :           0 :         errno = EMFILE;
    1180                 :           0 :         return false;
    1181                 :        1128 : }
    1182                 :             : 
    1183                 :             : /*
    1184                 :             :  * ReserveExternalFD - report external consumption of a file descriptor
    1185                 :             :  *
    1186                 :             :  * This should be used by callers that need to hold a file descriptor open
    1187                 :             :  * over more than a short interval, but cannot use any of the other facilities
    1188                 :             :  * provided by this module.  This just tracks the use of the FD and closes
    1189                 :             :  * VFDs if needed to ensure we keep NUM_RESERVED_FDS FDs available.
    1190                 :             :  *
    1191                 :             :  * Call this directly only in code where failure to reserve the FD would be
    1192                 :             :  * fatal; for example, the WAL-writing code does so, since the alternative is
    1193                 :             :  * session failure.  Also, it's very unwise to do so in code that could
    1194                 :             :  * consume more than one FD per process.
    1195                 :             :  *
    1196                 :             :  * Note: as long as everybody plays nice so that NUM_RESERVED_FDS FDs remain
    1197                 :             :  * available, it doesn't matter too much whether this is called before or
    1198                 :             :  * after actually opening the FD; but doing so beforehand reduces the risk of
    1199                 :             :  * an EMFILE failure if not everybody played nice.  In any case, it's solely
    1200                 :             :  * caller's responsibility to keep the external-FD count in sync with reality.
    1201                 :             :  */
    1202                 :             : void
    1203                 :        5224 : ReserveExternalFD(void)
    1204                 :             : {
    1205                 :             :         /*
    1206                 :             :          * Release VFDs if needed to stay safe.  Because we do this before
    1207                 :             :          * incrementing numExternalFDs, the final state will be as desired, i.e.,
    1208                 :             :          * nfile + numAllocatedDescs + numExternalFDs <= max_safe_fds.
    1209                 :             :          */
    1210                 :        5224 :         ReleaseLruFiles();
    1211                 :             : 
    1212                 :        5224 :         numExternalFDs++;
    1213                 :        5224 : }
    1214                 :             : 
    1215                 :             : /*
    1216                 :             :  * ReleaseExternalFD - report release of an external file descriptor
    1217                 :             :  *
    1218                 :             :  * This is guaranteed not to change errno, so it can be used in failure paths.
    1219                 :             :  */
    1220                 :             : void
    1221                 :        5120 : ReleaseExternalFD(void)
    1222                 :             : {
    1223         [ +  - ]:        5120 :         Assert(numExternalFDs > 0);
    1224                 :        5120 :         numExternalFDs--;
    1225                 :        5120 : }
    1226                 :             : 
    1227                 :             : 
    1228                 :             : #if defined(FDDEBUG)
    1229                 :             : 
    1230                 :             : static void
    1231                 :             : _dump_lru(void)
    1232                 :             : {
    1233                 :             :         int                     mru = VfdCache[0].lruLessRecently;
    1234                 :             :         Vfd                *vfdP = &VfdCache[mru];
    1235                 :             :         char            buf[2048];
    1236                 :             : 
    1237                 :             :         snprintf(buf, sizeof(buf), "LRU: MOST %d ", mru);
    1238                 :             :         while (mru != 0)
    1239                 :             :         {
    1240                 :             :                 mru = vfdP->lruLessRecently;
    1241                 :             :                 vfdP = &VfdCache[mru];
    1242                 :             :                 snprintf(buf + strlen(buf), sizeof(buf) - strlen(buf), "%d ", mru);
    1243                 :             :         }
    1244                 :             :         snprintf(buf + strlen(buf), sizeof(buf) - strlen(buf), "LEAST");
    1245                 :             :         elog(LOG, "%s", buf);
    1246                 :             : }
    1247                 :             : #endif                                                  /* FDDEBUG */
    1248                 :             : 
    1249                 :             : static void
    1250                 :       60338 : Delete(File file)
    1251                 :             : {
    1252                 :       60338 :         Vfd                *vfdP;
    1253                 :             : 
    1254         [ +  - ]:       60338 :         Assert(file != 0);
    1255                 :             : 
    1256                 :             :         DO_DB(elog(LOG, "Delete %d (%s)",
    1257                 :             :                            file, VfdCache[file].fileName));
    1258                 :             :         DO_DB(_dump_lru());
    1259                 :             : 
    1260                 :       60338 :         vfdP = &VfdCache[file];
    1261                 :             : 
    1262                 :       60338 :         VfdCache[vfdP->lruLessRecently].lruMoreRecently = vfdP->lruMoreRecently;
    1263                 :       60338 :         VfdCache[vfdP->lruMoreRecently].lruLessRecently = vfdP->lruLessRecently;
    1264                 :             : 
    1265                 :             :         DO_DB(_dump_lru());
    1266                 :       60338 : }
    1267                 :             : 
    1268                 :             : static void
    1269                 :        1812 : LruDelete(File file)
    1270                 :             : {
    1271                 :        1812 :         Vfd                *vfdP;
    1272                 :             : 
    1273         [ +  - ]:        1812 :         Assert(file != 0);
    1274                 :             : 
    1275                 :             :         DO_DB(elog(LOG, "LruDelete %d (%s)",
    1276                 :             :                            file, VfdCache[file].fileName));
    1277                 :             : 
    1278                 :        1812 :         vfdP = &VfdCache[file];
    1279                 :             : 
    1280                 :        1812 :         pgaio_closing_fd(vfdP->fd);
    1281                 :             : 
    1282                 :             :         /*
    1283                 :             :          * Close the file.  We aren't expecting this to fail; if it does, better
    1284                 :             :          * to leak the FD than to mess up our internal state.
    1285                 :             :          */
    1286         [ +  - ]:        1812 :         if (close(vfdP->fd) != 0)
    1287   [ #  #  #  #  :           0 :                 elog(vfdP->fdstate & FD_TEMP_FILE_LIMIT ? LOG : data_sync_elevel(LOG),
          #  #  #  #  #  
                #  #  # ]
    1288                 :             :                          "could not close file \"%s\": %m", vfdP->fileName);
    1289                 :        1812 :         vfdP->fd = VFD_CLOSED;
    1290                 :        1812 :         --nfile;
    1291                 :             : 
    1292                 :             :         /* delete the vfd record from the LRU ring */
    1293                 :        1812 :         Delete(file);
    1294                 :        1812 : }
    1295                 :             : 
    1296                 :             : static void
    1297                 :       72232 : Insert(File file)
    1298                 :             : {
    1299                 :       72232 :         Vfd                *vfdP;
    1300                 :             : 
    1301         [ +  - ]:       72232 :         Assert(file != 0);
    1302                 :             : 
    1303                 :             :         DO_DB(elog(LOG, "Insert %d (%s)",
    1304                 :             :                            file, VfdCache[file].fileName));
    1305                 :             :         DO_DB(_dump_lru());
    1306                 :             : 
    1307                 :       72232 :         vfdP = &VfdCache[file];
    1308                 :             : 
    1309                 :       72232 :         vfdP->lruMoreRecently = 0;
    1310                 :       72232 :         vfdP->lruLessRecently = VfdCache[0].lruLessRecently;
    1311                 :       72232 :         VfdCache[0].lruLessRecently = file;
    1312                 :       72232 :         VfdCache[vfdP->lruLessRecently].lruMoreRecently = file;
    1313                 :             : 
    1314                 :             :         DO_DB(_dump_lru());
    1315                 :       72232 : }
    1316                 :             : 
    1317                 :             : /* returns 0 on success, -1 on re-open failure (with errno set) */
    1318                 :             : static int
    1319                 :          17 : LruInsert(File file)
    1320                 :             : {
    1321                 :          17 :         Vfd                *vfdP;
    1322                 :             : 
    1323         [ +  - ]:          17 :         Assert(file != 0);
    1324                 :             : 
    1325                 :             :         DO_DB(elog(LOG, "LruInsert %d (%s)",
    1326                 :             :                            file, VfdCache[file].fileName));
    1327                 :             : 
    1328                 :          17 :         vfdP = &VfdCache[file];
    1329                 :             : 
    1330         [ -  + ]:          17 :         if (FileIsNotOpen(file))
    1331                 :             :         {
    1332                 :             :                 /* Close excess kernel FDs. */
    1333                 :          17 :                 ReleaseLruFiles();
    1334                 :             : 
    1335                 :             :                 /*
    1336                 :             :                  * The open could still fail for lack of file descriptors, eg due to
    1337                 :             :                  * overall system file table being full.  So, be prepared to release
    1338                 :             :                  * another FD if necessary...
    1339                 :             :                  */
    1340                 :          34 :                 vfdP->fd = BasicOpenFilePerm(vfdP->fileName, vfdP->fileFlags,
    1341                 :          17 :                                                                          vfdP->fileMode);
    1342         [ +  - ]:          17 :                 if (vfdP->fd < 0)
    1343                 :             :                 {
    1344                 :             :                         DO_DB(elog(LOG, "re-open failed: %m"));
    1345                 :           0 :                         return -1;
    1346                 :             :                 }
    1347                 :             :                 else
    1348                 :             :                 {
    1349                 :          17 :                         ++nfile;
    1350                 :             :                 }
    1351                 :          17 :         }
    1352                 :             : 
    1353                 :             :         /*
    1354                 :             :          * put it at the head of the Lru ring
    1355                 :             :          */
    1356                 :             : 
    1357                 :          17 :         Insert(file);
    1358                 :             : 
    1359                 :          17 :         return 0;
    1360                 :          17 : }
    1361                 :             : 
    1362                 :             : /*
    1363                 :             :  * Release one kernel FD by closing the least-recently-used VFD.
    1364                 :             :  */
    1365                 :             : static bool
    1366                 :        1790 : ReleaseLruFile(void)
    1367                 :             : {
    1368                 :             :         DO_DB(elog(LOG, "ReleaseLruFile. Opened %d", nfile));
    1369                 :             : 
    1370         [ +  - ]:        1790 :         if (nfile > 0)
    1371                 :             :         {
    1372                 :             :                 /*
    1373                 :             :                  * There are opened files and so there should be at least one used vfd
    1374                 :             :                  * in the ring.
    1375                 :             :                  */
    1376         [ +  - ]:        1790 :                 Assert(VfdCache[0].lruMoreRecently != 0);
    1377                 :        1790 :                 LruDelete(VfdCache[0].lruMoreRecently);
    1378                 :        1790 :                 return true;                    /* freed a file */
    1379                 :             :         }
    1380                 :           0 :         return false;                           /* no files available to free */
    1381                 :        1790 : }
    1382                 :             : 
    1383                 :             : /*
    1384                 :             :  * Release kernel FDs as needed to get under the max_safe_fds limit.
    1385                 :             :  * After calling this, it's OK to try to open another file.
    1386                 :             :  */
    1387                 :             : static void
    1388                 :      118827 : ReleaseLruFiles(void)
    1389                 :             : {
    1390         [ +  + ]:      120617 :         while (nfile + numAllocatedDescs + numExternalFDs >= max_safe_fds)
    1391                 :             :         {
    1392         [ -  + ]:        1790 :                 if (!ReleaseLruFile())
    1393                 :           0 :                         break;
    1394                 :             :         }
    1395                 :      118827 : }
    1396                 :             : 
    1397                 :             : static File
    1398                 :      101845 : AllocateVfd(void)
    1399                 :             : {
    1400                 :      101845 :         Index           i;
    1401                 :      101845 :         File            file;
    1402                 :             : 
    1403                 :             :         DO_DB(elog(LOG, "AllocateVfd. Size %zu", SizeVfdCache));
    1404                 :             : 
    1405         [ +  - ]:      101845 :         Assert(SizeVfdCache > 0);    /* InitFileAccess not called? */
    1406                 :             : 
    1407         [ +  + ]:      101845 :         if (VfdCache[0].nextFree == 0)
    1408                 :             :         {
    1409                 :             :                 /*
    1410                 :             :                  * The free list is empty so it is time to increase the size of the
    1411                 :             :                  * array.  We choose to double it each time this happens. However,
    1412                 :             :                  * there's not much point in starting *real* small.
    1413                 :             :                  */
    1414                 :        1021 :                 Size            newCacheSize = SizeVfdCache * 2;
    1415                 :        1021 :                 Vfd                *newVfdCache;
    1416                 :             : 
    1417         [ +  + ]:        1021 :                 if (newCacheSize < 32)
    1418                 :         801 :                         newCacheSize = 32;
    1419                 :             : 
    1420                 :             :                 /*
    1421                 :             :                  * Be careful not to clobber VfdCache ptr if realloc fails.
    1422                 :             :                  */
    1423                 :        1021 :                 newVfdCache = (Vfd *) realloc(VfdCache, sizeof(Vfd) * newCacheSize);
    1424         [ +  - ]:        1021 :                 if (newVfdCache == NULL)
    1425   [ #  #  #  # ]:           0 :                         ereport(ERROR,
    1426                 :             :                                         (errcode(ERRCODE_OUT_OF_MEMORY),
    1427                 :             :                                          errmsg("out of memory")));
    1428                 :        1021 :                 VfdCache = newVfdCache;
    1429                 :             : 
    1430                 :             :                 /*
    1431                 :             :                  * Initialize the new entries and link them into the free list.
    1432                 :             :                  */
    1433         [ +  + ]:       39292 :                 for (i = SizeVfdCache; i < newCacheSize; i++)
    1434                 :             :                 {
    1435   [ +  -  +  -  :      306168 :                         MemSet(&(VfdCache[i]), 0, sizeof(Vfd));
          +  -  -  +  +  
                      + ]
    1436                 :       38271 :                         VfdCache[i].nextFree = i + 1;
    1437                 :       38271 :                         VfdCache[i].fd = VFD_CLOSED;
    1438                 :       38271 :                 }
    1439                 :        1021 :                 VfdCache[newCacheSize - 1].nextFree = 0;
    1440                 :        1021 :                 VfdCache[0].nextFree = SizeVfdCache;
    1441                 :             : 
    1442                 :             :                 /*
    1443                 :             :                  * Record the new size
    1444                 :             :                  */
    1445                 :        1021 :                 SizeVfdCache = newCacheSize;
    1446                 :        1021 :         }
    1447                 :             : 
    1448                 :      101845 :         file = VfdCache[0].nextFree;
    1449                 :             : 
    1450                 :      101845 :         VfdCache[0].nextFree = VfdCache[file].nextFree;
    1451                 :             : 
    1452                 :      203690 :         return file;
    1453                 :      101845 : }
    1454                 :             : 
    1455                 :             : static void
    1456                 :       89747 : FreeVfd(File file)
    1457                 :             : {
    1458                 :       89747 :         Vfd                *vfdP = &VfdCache[file];
    1459                 :             : 
    1460                 :             :         DO_DB(elog(LOG, "FreeVfd: %d (%s)",
    1461                 :             :                            file, vfdP->fileName ? vfdP->fileName : ""));
    1462                 :             : 
    1463         [ +  + ]:       89747 :         if (vfdP->fileName != NULL)
    1464                 :             :         {
    1465                 :       43381 :                 free(vfdP->fileName);
    1466                 :       43381 :                 vfdP->fileName = NULL;
    1467                 :       43381 :         }
    1468                 :       89747 :         vfdP->fdstate = 0x0;
    1469                 :             : 
    1470                 :       89747 :         vfdP->nextFree = VfdCache[0].nextFree;
    1471                 :       89747 :         VfdCache[0].nextFree = file;
    1472                 :       89747 : }
    1473                 :             : 
    1474                 :             : /* returns 0 on success, -1 on re-open failure (with errno set) */
    1475                 :             : static int
    1476                 :      123579 : FileAccess(File file)
    1477                 :             : {
    1478                 :      123579 :         int                     returnValue;
    1479                 :             : 
    1480                 :             :         DO_DB(elog(LOG, "FileAccess %d (%s)",
    1481                 :             :                            file, VfdCache[file].fileName));
    1482                 :             : 
    1483                 :             :         /*
    1484                 :             :          * Is the file open?  If not, open it and put it at the head of the LRU
    1485                 :             :          * ring (possibly closing the least recently used file to get an FD).
    1486                 :             :          */
    1487                 :             : 
    1488         [ +  + ]:      123579 :         if (FileIsNotOpen(file))
    1489                 :             :         {
    1490                 :          17 :                 returnValue = LruInsert(file);
    1491         [ -  + ]:          17 :                 if (returnValue != 0)
    1492                 :           0 :                         return returnValue;
    1493                 :          17 :         }
    1494         [ +  + ]:      123562 :         else if (VfdCache[0].lruLessRecently != file)
    1495                 :             :         {
    1496                 :             :                 /*
    1497                 :             :                  * We now know that the file is open and that it is not the last one
    1498                 :             :                  * accessed, so we need to move it to the head of the Lru ring.
    1499                 :             :                  */
    1500                 :             : 
    1501                 :       16736 :                 Delete(file);
    1502                 :       16736 :                 Insert(file);
    1503                 :       16736 :         }
    1504                 :             : 
    1505                 :      123579 :         return 0;
    1506                 :      123579 : }
    1507                 :             : 
    1508                 :             : /*
    1509                 :             :  * Called whenever a temporary file is deleted to report its size.
    1510                 :             :  */
    1511                 :             : static void
    1512                 :         933 : ReportTemporaryFileUsage(const char *path, pgoff_t size)
    1513                 :             : {
    1514                 :         933 :         pgstat_report_tempfile(size);
    1515                 :             : 
    1516         [ +  + ]:         933 :         if (log_temp_files >= 0)
    1517                 :             :         {
    1518         [ +  + ]:          12 :                 if ((size / 1024) >= log_temp_files)
    1519   [ -  +  +  - ]:           4 :                         ereport(LOG,
    1520                 :             :                                         (errmsg("temporary file: path \"%s\", size %lu",
    1521                 :             :                                                         path, (unsigned long) size)));
    1522                 :          12 :         }
    1523                 :         933 : }
    1524                 :             : 
    1525                 :             : /*
    1526                 :             :  * Called to register a temporary file for automatic close.
    1527                 :             :  * ResourceOwnerEnlarge(CurrentResourceOwner) must have been called
    1528                 :             :  * before the file was opened.
    1529                 :             :  */
    1530                 :             : static void
    1531                 :        1366 : RegisterTemporaryFile(File file)
    1532                 :             : {
    1533                 :        1366 :         ResourceOwnerRememberFile(CurrentResourceOwner, file);
    1534                 :        1366 :         VfdCache[file].resowner = CurrentResourceOwner;
    1535                 :             : 
    1536                 :             :         /* Backup mechanism for closing at end of xact. */
    1537                 :        1366 :         VfdCache[file].fdstate |= FD_CLOSE_AT_EOXACT;
    1538                 :        1366 :         have_xact_temporary_files = true;
    1539                 :        1366 : }
    1540                 :             : 
    1541                 :             : /*
    1542                 :             :  *      Called when we get a shared invalidation message on some relation.
    1543                 :             :  */
    1544                 :             : #ifdef NOT_USED
    1545                 :             : void
    1546                 :             : FileInvalidate(File file)
    1547                 :             : {
    1548                 :             :         Assert(FileIsValid(file));
    1549                 :             :         if (!FileIsNotOpen(file))
    1550                 :             :                 LruDelete(file);
    1551                 :             : }
    1552                 :             : #endif
    1553                 :             : 
    1554                 :             : /*
    1555                 :             :  * Open a file with PathNameOpenFilePerm() and pass default file mode for the
    1556                 :             :  * fileMode parameter.
    1557                 :             :  */
    1558                 :             : File
    1559                 :      101845 : PathNameOpenFile(const char *fileName, int fileFlags)
    1560                 :             : {
    1561                 :      101845 :         return PathNameOpenFilePerm(fileName, fileFlags, pg_file_create_mode);
    1562                 :             : }
    1563                 :             : 
    1564                 :             : /*
    1565                 :             :  * open a file in an arbitrary directory
    1566                 :             :  *
    1567                 :             :  * NB: if the passed pathname is relative (which it usually is),
    1568                 :             :  * it will be interpreted relative to the process' working directory
    1569                 :             :  * (which should always be $PGDATA when this code is running).
    1570                 :             :  */
    1571                 :             : File
    1572                 :      101845 : PathNameOpenFilePerm(const char *fileName, int fileFlags, mode_t fileMode)
    1573                 :             : {
    1574                 :      101845 :         char       *fnamecopy;
    1575                 :      101845 :         File            file;
    1576                 :      101845 :         Vfd                *vfdP;
    1577                 :             : 
    1578                 :             :         DO_DB(elog(LOG, "PathNameOpenFilePerm: %s %x %o",
    1579                 :             :                            fileName, fileFlags, fileMode));
    1580                 :             : 
    1581                 :             :         /*
    1582                 :             :          * We need a malloc'd copy of the file name; fail cleanly if no room.
    1583                 :             :          */
    1584                 :      101845 :         fnamecopy = strdup(fileName);
    1585         [ +  - ]:      101845 :         if (fnamecopy == NULL)
    1586   [ #  #  #  # ]:           0 :                 ereport(ERROR,
    1587                 :             :                                 (errcode(ERRCODE_OUT_OF_MEMORY),
    1588                 :             :                                  errmsg("out of memory")));
    1589                 :             : 
    1590                 :      101845 :         file = AllocateVfd();
    1591                 :      101845 :         vfdP = &VfdCache[file];
    1592                 :             : 
    1593                 :             :         /* Close excess kernel FDs. */
    1594                 :      101845 :         ReleaseLruFiles();
    1595                 :             : 
    1596                 :             :         /*
    1597                 :             :          * Descriptors managed by VFDs are implicitly marked O_CLOEXEC.  The
    1598                 :             :          * client shouldn't be expected to know which kernel descriptors are
    1599                 :             :          * currently open, so it wouldn't make sense for them to be inherited by
    1600                 :             :          * executed subprograms.
    1601                 :             :          */
    1602                 :      101845 :         fileFlags |= O_CLOEXEC;
    1603                 :             : 
    1604                 :      101845 :         vfdP->fd = BasicOpenFilePerm(fileName, fileFlags, fileMode);
    1605                 :             : 
    1606         [ +  + ]:      101845 :         if (vfdP->fd < 0)
    1607                 :             :         {
    1608                 :       46366 :                 int                     save_errno = errno;
    1609                 :             : 
    1610                 :       46366 :                 FreeVfd(file);
    1611                 :       46366 :                 free(fnamecopy);
    1612                 :       46366 :                 errno = save_errno;
    1613                 :       46366 :                 return -1;
    1614                 :       46366 :         }
    1615                 :       55479 :         ++nfile;
    1616                 :             :         DO_DB(elog(LOG, "PathNameOpenFile: success %d",
    1617                 :             :                            vfdP->fd));
    1618                 :             : 
    1619                 :       55479 :         vfdP->fileName = fnamecopy;
    1620                 :             :         /* Saved flags are adjusted to be OK for re-opening file */
    1621                 :       55479 :         vfdP->fileFlags = fileFlags & ~(O_CREAT | O_TRUNC | O_EXCL);
    1622                 :       55479 :         vfdP->fileMode = fileMode;
    1623                 :       55479 :         vfdP->fileSize = 0;
    1624                 :       55479 :         vfdP->fdstate = 0x0;
    1625                 :       55479 :         vfdP->resowner = NULL;
    1626                 :             : 
    1627                 :       55479 :         Insert(file);
    1628                 :             : 
    1629                 :       55479 :         return file;
    1630                 :      101845 : }
    1631                 :             : 
    1632                 :             : /*
    1633                 :             :  * Create directory 'directory'.  If necessary, create 'basedir', which must
    1634                 :             :  * be the directory above it.  This is designed for creating the top-level
    1635                 :             :  * temporary directory on demand before creating a directory underneath it.
    1636                 :             :  * Do nothing if the directory already exists.
    1637                 :             :  *
    1638                 :             :  * Directories created within the top-level temporary directory should begin
    1639                 :             :  * with PG_TEMP_FILE_PREFIX, so that they can be identified as temporary and
    1640                 :             :  * deleted at startup by RemovePgTempFiles().  Further subdirectories below
    1641                 :             :  * that do not need any particular prefix.
    1642                 :             : */
    1643                 :             : void
    1644                 :          64 : PathNameCreateTemporaryDir(const char *basedir, const char *directory)
    1645                 :             : {
    1646         [ +  + ]:          64 :         if (MakePGDirectory(directory) < 0)
    1647                 :             :         {
    1648         [ +  + ]:           9 :                 if (errno == EEXIST)
    1649                 :           8 :                         return;
    1650                 :             : 
    1651                 :             :                 /*
    1652                 :             :                  * Failed.  Try to create basedir first in case it's missing. Tolerate
    1653                 :             :                  * EEXIST to close a race against another process following the same
    1654                 :             :                  * algorithm.
    1655                 :             :                  */
    1656   [ -  +  #  # ]:           1 :                 if (MakePGDirectory(basedir) < 0 && errno != EEXIST)
    1657   [ #  #  #  # ]:           0 :                         ereport(ERROR,
    1658                 :             :                                         (errcode_for_file_access(),
    1659                 :             :                                          errmsg("cannot create temporary directory \"%s\": %m",
    1660                 :             :                                                         basedir)));
    1661                 :             : 
    1662                 :             :                 /* Try again. */
    1663   [ -  +  #  # ]:           1 :                 if (MakePGDirectory(directory) < 0 && errno != EEXIST)
    1664   [ #  #  #  # ]:           0 :                         ereport(ERROR,
    1665                 :             :                                         (errcode_for_file_access(),
    1666                 :             :                                          errmsg("cannot create temporary subdirectory \"%s\": %m",
    1667                 :             :                                                         directory)));
    1668                 :           1 :         }
    1669                 :          64 : }
    1670                 :             : 
    1671                 :             : /*
    1672                 :             :  * Delete a directory and everything in it, if it exists.
    1673                 :             :  */
    1674                 :             : void
    1675                 :          70 : PathNameDeleteTemporaryDir(const char *dirname)
    1676                 :             : {
    1677                 :          70 :         struct stat statbuf;
    1678                 :             : 
    1679                 :             :         /* Silently ignore missing directory. */
    1680   [ +  +  -  + ]:          70 :         if (stat(dirname, &statbuf) != 0 && errno == ENOENT)
    1681                 :          14 :                 return;
    1682                 :             : 
    1683                 :             :         /*
    1684                 :             :          * Currently, walkdir doesn't offer a way for our passed in function to
    1685                 :             :          * maintain state.  Perhaps it should, so that we could tell the caller
    1686                 :             :          * whether this operation succeeded or failed.  Since this operation is
    1687                 :             :          * used in a cleanup path, we wouldn't actually behave differently: we'll
    1688                 :             :          * just log failures.
    1689                 :             :          */
    1690                 :          56 :         walkdir(dirname, unlink_if_exists_fname, false, LOG);
    1691         [ -  + ]:          70 : }
    1692                 :             : 
    1693                 :             : /*
    1694                 :             :  * Open a temporary file that will disappear when we close it.
    1695                 :             :  *
    1696                 :             :  * This routine takes care of generating an appropriate tempfile name.
    1697                 :             :  * There's no need to pass in fileFlags or fileMode either, since only
    1698                 :             :  * one setting makes any sense for a temp file.
    1699                 :             :  *
    1700                 :             :  * Unless interXact is true, the file is remembered by CurrentResourceOwner
    1701                 :             :  * to ensure it's closed and deleted when it's no longer needed, typically at
    1702                 :             :  * the end-of-transaction. In most cases, you don't want temporary files to
    1703                 :             :  * outlive the transaction that created them, so this should be false -- but
    1704                 :             :  * if you need "somewhat" temporary storage, this might be useful. In either
    1705                 :             :  * case, the file is removed when the File is explicitly closed.
    1706                 :             :  */
    1707                 :             : File
    1708                 :         485 : OpenTemporaryFile(bool interXact)
    1709                 :             : {
    1710                 :         485 :         File            file = 0;
    1711                 :             : 
    1712         [ +  - ]:         485 :         Assert(temporary_files_allowed);        /* check temp file access is up */
    1713                 :             : 
    1714                 :             :         /*
    1715                 :             :          * Make sure the current resource owner has space for this File before we
    1716                 :             :          * open it, if we'll be registering it below.
    1717                 :             :          */
    1718         [ -  + ]:         485 :         if (!interXact)
    1719                 :         485 :                 ResourceOwnerEnlarge(CurrentResourceOwner);
    1720                 :             : 
    1721                 :             :         /*
    1722                 :             :          * If some temp tablespace(s) have been given to us, try to use the next
    1723                 :             :          * one.  If a given tablespace can't be found, we silently fall back to
    1724                 :             :          * the database's default tablespace.
    1725                 :             :          *
    1726                 :             :          * BUT: if the temp file is slated to outlive the current transaction,
    1727                 :             :          * force it into the database's default tablespace, so that it will not
    1728                 :             :          * pose a threat to possible tablespace drop attempts.
    1729                 :             :          */
    1730   [ -  +  #  # ]:         485 :         if (numTempTableSpaces > 0 && !interXact)
    1731                 :             :         {
    1732                 :           0 :                 Oid                     tblspcOid = GetNextTempTableSpace();
    1733                 :             : 
    1734         [ #  # ]:           0 :                 if (OidIsValid(tblspcOid))
    1735                 :           0 :                         file = OpenTemporaryFileInTablespace(tblspcOid, false);
    1736                 :           0 :         }
    1737                 :             : 
    1738                 :             :         /*
    1739                 :             :          * If not, or if tablespace is bad, create in database's default
    1740                 :             :          * tablespace.  MyDatabaseTableSpace should normally be set before we get
    1741                 :             :          * here, but just in case it isn't, fall back to pg_default tablespace.
    1742                 :             :          */
    1743         [ -  + ]:         485 :         if (file <= 0)
    1744         [ +  - ]:         485 :                 file = OpenTemporaryFileInTablespace(MyDatabaseTableSpace ?
    1745                 :         485 :                                                                                          MyDatabaseTableSpace :
    1746                 :             :                                                                                          DEFAULTTABLESPACE_OID,
    1747                 :             :                                                                                          true);
    1748                 :             : 
    1749                 :             :         /* Mark it for deletion at close and temporary file size limit */
    1750                 :         485 :         VfdCache[file].fdstate |= FD_DELETE_AT_CLOSE | FD_TEMP_FILE_LIMIT;
    1751                 :             : 
    1752                 :             :         /* Register it with the current resource owner */
    1753         [ -  + ]:         485 :         if (!interXact)
    1754                 :         485 :                 RegisterTemporaryFile(file);
    1755                 :             : 
    1756                 :         970 :         return file;
    1757                 :         485 : }
    1758                 :             : 
    1759                 :             : /*
    1760                 :             :  * Return the path of the temp directory in a given tablespace.
    1761                 :             :  */
    1762                 :             : void
    1763                 :        2445 : TempTablespacePath(char *path, Oid tablespace)
    1764                 :             : {
    1765                 :             :         /*
    1766                 :             :          * Identify the tempfile directory for this tablespace.
    1767                 :             :          *
    1768                 :             :          * If someone tries to specify pg_global, use pg_default instead.
    1769                 :             :          */
    1770         [ +  - ]:        2445 :         if (tablespace == InvalidOid ||
    1771   [ -  +  #  # ]:        2445 :                 tablespace == DEFAULTTABLESPACE_OID ||
    1772                 :           0 :                 tablespace == GLOBALTABLESPACE_OID)
    1773                 :        2445 :                 snprintf(path, MAXPGPATH, "base/%s", PG_TEMP_FILES_DIR);
    1774                 :             :         else
    1775                 :             :         {
    1776                 :             :                 /* All other tablespaces are accessed via symlinks */
    1777                 :           0 :                 snprintf(path, MAXPGPATH, "%s/%u/%s/%s",
    1778                 :           0 :                                  PG_TBLSPC_DIR, tablespace, TABLESPACE_VERSION_DIRECTORY,
    1779                 :             :                                  PG_TEMP_FILES_DIR);
    1780                 :             :         }
    1781                 :        2445 : }
    1782                 :             : 
    1783                 :             : /*
    1784                 :             :  * Open a temporary file in a specific tablespace.
    1785                 :             :  * Subroutine for OpenTemporaryFile, which see for details.
    1786                 :             :  */
    1787                 :             : static File
    1788                 :         485 : OpenTemporaryFileInTablespace(Oid tblspcOid, bool rejectError)
    1789                 :             : {
    1790                 :         485 :         char            tempdirpath[MAXPGPATH];
    1791                 :         485 :         char            tempfilepath[MAXPGPATH];
    1792                 :         485 :         File            file;
    1793                 :             : 
    1794                 :         485 :         TempTablespacePath(tempdirpath, tblspcOid);
    1795                 :             : 
    1796                 :             :         /*
    1797                 :             :          * Generate a tempfile name that should be unique within the current
    1798                 :             :          * database instance.
    1799                 :             :          */
    1800                 :         970 :         snprintf(tempfilepath, sizeof(tempfilepath), "%s/%s%d.%ld",
    1801                 :         485 :                          tempdirpath, PG_TEMP_FILE_PREFIX, MyProcPid, tempFileCounter++);
    1802                 :             : 
    1803                 :             :         /*
    1804                 :             :          * Open the file.  Note: we don't use O_EXCL, in case there is an orphaned
    1805                 :             :          * temp file that can be reused.
    1806                 :             :          */
    1807                 :         485 :         file = PathNameOpenFile(tempfilepath,
    1808                 :             :                                                         O_RDWR | O_CREAT | O_TRUNC | PG_BINARY);
    1809         [ +  + ]:         485 :         if (file <= 0)
    1810                 :             :         {
    1811                 :             :                 /*
    1812                 :             :                  * We might need to create the tablespace's tempfile directory, if no
    1813                 :             :                  * one has yet done so.
    1814                 :             :                  *
    1815                 :             :                  * Don't check for an error from MakePGDirectory; it could fail if
    1816                 :             :                  * someone else just did the same thing.  If it doesn't work then
    1817                 :             :                  * we'll bomb out on the second create attempt, instead.
    1818                 :             :                  */
    1819                 :           1 :                 (void) MakePGDirectory(tempdirpath);
    1820                 :             : 
    1821                 :           1 :                 file = PathNameOpenFile(tempfilepath,
    1822                 :             :                                                                 O_RDWR | O_CREAT | O_TRUNC | PG_BINARY);
    1823   [ -  +  #  # ]:           1 :                 if (file <= 0 && rejectError)
    1824   [ #  #  #  # ]:           0 :                         elog(ERROR, "could not create temporary file \"%s\": %m",
    1825                 :             :                                  tempfilepath);
    1826                 :           1 :         }
    1827                 :             : 
    1828                 :         970 :         return file;
    1829                 :         485 : }
    1830                 :             : 
    1831                 :             : 
    1832                 :             : /*
    1833                 :             :  * Create a new file.  The directory containing it must already exist.  Files
    1834                 :             :  * created this way are subject to temp_file_limit and are automatically
    1835                 :             :  * closed at end of transaction, but are not automatically deleted on close
    1836                 :             :  * because they are intended to be shared between cooperating backends.
    1837                 :             :  *
    1838                 :             :  * If the file is inside the top-level temporary directory, its name should
    1839                 :             :  * begin with PG_TEMP_FILE_PREFIX so that it can be identified as temporary
    1840                 :             :  * and deleted at startup by RemovePgTempFiles().  Alternatively, it can be
    1841                 :             :  * inside a directory created with PathNameCreateTemporaryDir(), in which case
    1842                 :             :  * the prefix isn't needed.
    1843                 :             :  */
    1844                 :             : File
    1845                 :         512 : PathNameCreateTemporaryFile(const char *path, bool error_on_failure)
    1846                 :             : {
    1847                 :         512 :         File            file;
    1848                 :             : 
    1849         [ +  - ]:         512 :         Assert(temporary_files_allowed);        /* check temp file access is up */
    1850                 :             : 
    1851                 :         512 :         ResourceOwnerEnlarge(CurrentResourceOwner);
    1852                 :             : 
    1853                 :             :         /*
    1854                 :             :          * Open the file.  Note: we don't use O_EXCL, in case there is an orphaned
    1855                 :             :          * temp file that can be reused.
    1856                 :             :          */
    1857                 :         512 :         file = PathNameOpenFile(path, O_RDWR | O_CREAT | O_TRUNC | PG_BINARY);
    1858         [ +  + ]:         512 :         if (file <= 0)
    1859                 :             :         {
    1860         [ +  - ]:          64 :                 if (error_on_failure)
    1861   [ #  #  #  # ]:           0 :                         ereport(ERROR,
    1862                 :             :                                         (errcode_for_file_access(),
    1863                 :             :                                          errmsg("could not create temporary file \"%s\": %m",
    1864                 :             :                                                         path)));
    1865                 :             :                 else
    1866                 :          64 :                         return file;
    1867                 :           0 :         }
    1868                 :             : 
    1869                 :             :         /* Mark it for temp_file_limit accounting. */
    1870                 :         448 :         VfdCache[file].fdstate |= FD_TEMP_FILE_LIMIT;
    1871                 :             : 
    1872                 :             :         /* Register it for automatic close. */
    1873                 :         448 :         RegisterTemporaryFile(file);
    1874                 :             : 
    1875                 :         448 :         return file;
    1876                 :         512 : }
    1877                 :             : 
    1878                 :             : /*
    1879                 :             :  * Open a file that was created with PathNameCreateTemporaryFile, possibly in
    1880                 :             :  * another backend.  Files opened this way don't count against the
    1881                 :             :  * temp_file_limit of the caller, are automatically closed at the end of the
    1882                 :             :  * transaction but are not deleted on close.
    1883                 :             :  */
    1884                 :             : File
    1885                 :         866 : PathNameOpenTemporaryFile(const char *path, int mode)
    1886                 :             : {
    1887                 :         866 :         File            file;
    1888                 :             : 
    1889         [ +  - ]:         866 :         Assert(temporary_files_allowed);        /* check temp file access is up */
    1890                 :             : 
    1891                 :         866 :         ResourceOwnerEnlarge(CurrentResourceOwner);
    1892                 :             : 
    1893                 :         866 :         file = PathNameOpenFile(path, mode | PG_BINARY);
    1894                 :             : 
    1895                 :             :         /* If no such file, then we don't raise an error. */
    1896   [ +  +  +  - ]:         866 :         if (file <= 0 && errno != ENOENT)
    1897   [ #  #  #  # ]:           0 :                 ereport(ERROR,
    1898                 :             :                                 (errcode_for_file_access(),
    1899                 :             :                                  errmsg("could not open temporary file \"%s\": %m",
    1900                 :             :                                                 path)));
    1901                 :             : 
    1902         [ +  + ]:         866 :         if (file > 0)
    1903                 :             :         {
    1904                 :             :                 /* Register it for automatic close. */
    1905                 :         433 :                 RegisterTemporaryFile(file);
    1906                 :         433 :         }
    1907                 :             : 
    1908                 :        1732 :         return file;
    1909                 :         866 : }
    1910                 :             : 
    1911                 :             : /*
    1912                 :             :  * Delete a file by pathname.  Return true if the file existed, false if
    1913                 :             :  * didn't.
    1914                 :             :  */
    1915                 :             : bool
    1916                 :         896 : PathNameDeleteTemporaryFile(const char *path, bool error_on_failure)
    1917                 :             : {
    1918                 :         896 :         struct stat filestats;
    1919                 :         896 :         int                     stat_errno;
    1920                 :             : 
    1921                 :             :         /* Get the final size for pgstat reporting. */
    1922         [ +  + ]:         896 :         if (stat(path, &filestats) != 0)
    1923                 :         448 :                 stat_errno = errno;
    1924                 :             :         else
    1925                 :         448 :                 stat_errno = 0;
    1926                 :             : 
    1927                 :             :         /*
    1928                 :             :          * Unlike FileClose's automatic file deletion code, we tolerate
    1929                 :             :          * non-existence to support BufFileDeleteFileSet which doesn't know how
    1930                 :             :          * many segments it has to delete until it runs out.
    1931                 :             :          */
    1932         [ +  + ]:         896 :         if (stat_errno == ENOENT)
    1933                 :         448 :                 return false;
    1934                 :             : 
    1935         [ +  - ]:         448 :         if (unlink(path) < 0)
    1936                 :             :         {
    1937         [ #  # ]:           0 :                 if (errno != ENOENT)
    1938   [ #  #  #  #  :           0 :                         ereport(error_on_failure ? ERROR : LOG,
          #  #  #  #  #  
                      # ]
    1939                 :             :                                         (errcode_for_file_access(),
    1940                 :             :                                          errmsg("could not unlink temporary file \"%s\": %m",
    1941                 :             :                                                         path)));
    1942                 :           0 :                 return false;
    1943                 :             :         }
    1944                 :             : 
    1945         [ -  + ]:         448 :         if (stat_errno == 0)
    1946                 :         448 :                 ReportTemporaryFileUsage(path, filestats.st_size);
    1947                 :             :         else
    1948                 :             :         {
    1949                 :           0 :                 errno = stat_errno;
    1950   [ #  #  #  # ]:           0 :                 ereport(LOG,
    1951                 :             :                                 (errcode_for_file_access(),
    1952                 :             :                                  errmsg("could not stat file \"%s\": %m", path)));
    1953                 :             :         }
    1954                 :             : 
    1955                 :         448 :         return true;
    1956                 :         896 : }
    1957                 :             : 
    1958                 :             : /*
    1959                 :             :  * close a file when done with it
    1960                 :             :  */
    1961                 :             : void
    1962                 :       43381 : FileClose(File file)
    1963                 :             : {
    1964                 :       43381 :         Vfd                *vfdP;
    1965                 :             : 
    1966         [ +  - ]:       43381 :         Assert(FileIsValid(file));
    1967                 :             : 
    1968                 :             :         DO_DB(elog(LOG, "FileClose: %d (%s)",
    1969                 :             :                            file, VfdCache[file].fileName));
    1970                 :             : 
    1971                 :       43381 :         vfdP = &VfdCache[file];
    1972                 :             : 
    1973         [ +  + ]:       43381 :         if (!FileIsNotOpen(file))
    1974                 :             :         {
    1975                 :       41790 :                 pgaio_closing_fd(vfdP->fd);
    1976                 :             : 
    1977                 :             :                 /* close the file */
    1978         [ +  - ]:       41790 :                 if (close(vfdP->fd) != 0)
    1979                 :             :                 {
    1980                 :             :                         /*
    1981                 :             :                          * We may need to panic on failure to close non-temporary files;
    1982                 :             :                          * see LruDelete.
    1983                 :             :                          */
    1984   [ #  #  #  #  :           0 :                         elog(vfdP->fdstate & FD_TEMP_FILE_LIMIT ? LOG : data_sync_elevel(LOG),
          #  #  #  #  #  
                #  #  # ]
    1985                 :             :                                  "could not close file \"%s\": %m", vfdP->fileName);
    1986                 :           0 :                 }
    1987                 :             : 
    1988                 :       41790 :                 --nfile;
    1989                 :       41790 :                 vfdP->fd = VFD_CLOSED;
    1990                 :             : 
    1991                 :             :                 /* remove the file from the lru ring */
    1992                 :       41790 :                 Delete(file);
    1993                 :       41790 :         }
    1994                 :             : 
    1995         [ +  + ]:       43381 :         if (vfdP->fdstate & FD_TEMP_FILE_LIMIT)
    1996                 :             :         {
    1997                 :             :                 /* Subtract its size from current usage (do first in case of error) */
    1998                 :         933 :                 temporary_files_size -= vfdP->fileSize;
    1999                 :         933 :                 vfdP->fileSize = 0;
    2000                 :         933 :         }
    2001                 :             : 
    2002                 :             :         /*
    2003                 :             :          * Delete the file if it was temporary, and make a log entry if wanted
    2004                 :             :          */
    2005         [ +  + ]:       43381 :         if (vfdP->fdstate & FD_DELETE_AT_CLOSE)
    2006                 :             :         {
    2007                 :         485 :                 struct stat filestats;
    2008                 :         485 :                 int                     stat_errno;
    2009                 :             : 
    2010                 :             :                 /*
    2011                 :             :                  * If we get an error, as could happen within the ereport/elog calls,
    2012                 :             :                  * we'll come right back here during transaction abort.  Reset the
    2013                 :             :                  * flag to ensure that we can't get into an infinite loop.  This code
    2014                 :             :                  * is arranged to ensure that the worst-case consequence is failing to
    2015                 :             :                  * emit log message(s), not failing to attempt the unlink.
    2016                 :             :                  */
    2017                 :         485 :                 vfdP->fdstate &= ~FD_DELETE_AT_CLOSE;
    2018                 :             : 
    2019                 :             : 
    2020                 :             :                 /* first try the stat() */
    2021         [ -  + ]:         485 :                 if (stat(vfdP->fileName, &filestats))
    2022                 :           0 :                         stat_errno = errno;
    2023                 :             :                 else
    2024                 :         485 :                         stat_errno = 0;
    2025                 :             : 
    2026                 :             :                 /* in any case do the unlink */
    2027         [ +  - ]:         485 :                 if (unlink(vfdP->fileName))
    2028   [ #  #  #  # ]:           0 :                         ereport(LOG,
    2029                 :             :                                         (errcode_for_file_access(),
    2030                 :             :                                          errmsg("could not delete file \"%s\": %m", vfdP->fileName)));
    2031                 :             : 
    2032                 :             :                 /* and last report the stat results */
    2033         [ -  + ]:         485 :                 if (stat_errno == 0)
    2034                 :         485 :                         ReportTemporaryFileUsage(vfdP->fileName, filestats.st_size);
    2035                 :             :                 else
    2036                 :             :                 {
    2037                 :           0 :                         errno = stat_errno;
    2038   [ #  #  #  # ]:           0 :                         ereport(LOG,
    2039                 :             :                                         (errcode_for_file_access(),
    2040                 :             :                                          errmsg("could not stat file \"%s\": %m", vfdP->fileName)));
    2041                 :             :                 }
    2042                 :         485 :         }
    2043                 :             : 
    2044                 :             :         /* Unregister it from the resource owner */
    2045         [ +  + ]:       43381 :         if (vfdP->resowner)
    2046                 :        1366 :                 ResourceOwnerForgetFile(vfdP->resowner, file);
    2047                 :             : 
    2048                 :             :         /*
    2049                 :             :          * Return the Vfd slot to the free list
    2050                 :             :          */
    2051                 :       43381 :         FreeVfd(file);
    2052                 :       43381 : }
    2053                 :             : 
    2054                 :             : /*
    2055                 :             :  * FilePrefetch - initiate asynchronous read of a given range of the file.
    2056                 :             :  *
    2057                 :             :  * Returns 0 on success, otherwise an errno error code (like posix_fadvise()).
    2058                 :             :  *
    2059                 :             :  * posix_fadvise() is the simplest standardized interface that accomplishes
    2060                 :             :  * this.
    2061                 :             :  */
    2062                 :             : int
    2063                 :           0 : FilePrefetch(File file, pgoff_t offset, pgoff_t amount, uint32 wait_event_info)
    2064                 :             : {
    2065         [ #  # ]:           0 :         Assert(FileIsValid(file));
    2066                 :             : 
    2067                 :             :         DO_DB(elog(LOG, "FilePrefetch: %d (%s) " INT64_FORMAT " " INT64_FORMAT,
    2068                 :             :                            file, VfdCache[file].fileName,
    2069                 :             :                            (int64) offset, (int64) amount));
    2070                 :             : 
    2071                 :             : #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED)
    2072                 :             :         {
    2073                 :             :                 int                     returnCode;
    2074                 :             : 
    2075                 :             :                 returnCode = FileAccess(file);
    2076                 :             :                 if (returnCode < 0)
    2077                 :             :                         return returnCode;
    2078                 :             : 
    2079                 :             : retry:
    2080                 :             :                 pgstat_report_wait_start(wait_event_info);
    2081                 :             :                 returnCode = posix_fadvise(VfdCache[file].fd, offset, amount,
    2082                 :             :                                                                    POSIX_FADV_WILLNEED);
    2083                 :             :                 pgstat_report_wait_end();
    2084                 :             : 
    2085                 :             :                 if (returnCode == EINTR)
    2086                 :             :                         goto retry;
    2087                 :             : 
    2088                 :             :                 return returnCode;
    2089                 :             :         }
    2090                 :             : #elif defined(__darwin__)
    2091                 :             :         {
    2092                 :           0 :                 struct radvisory
    2093                 :             :                 {
    2094                 :             :                         off_t           ra_offset;      /* offset into the file */
    2095                 :             :                         int                     ra_count;       /* size of the read     */
    2096                 :             :                 }                       ra;
    2097                 :           0 :                 int                     returnCode;
    2098                 :             : 
    2099                 :           0 :                 returnCode = FileAccess(file);
    2100         [ #  # ]:           0 :                 if (returnCode < 0)
    2101                 :           0 :                         return returnCode;
    2102                 :             : 
    2103                 :           0 :                 ra.ra_offset = offset;
    2104                 :           0 :                 ra.ra_count = amount;
    2105                 :           0 :                 pgstat_report_wait_start(wait_event_info);
    2106                 :           0 :                 returnCode = fcntl(VfdCache[file].fd, F_RDADVISE, &ra);
    2107                 :           0 :                 pgstat_report_wait_end();
    2108         [ #  # ]:           0 :                 if (returnCode != -1)
    2109                 :           0 :                         return 0;
    2110                 :             :                 else
    2111                 :           0 :                         return errno;
    2112                 :           0 :         }
    2113                 :             : #else
    2114                 :             :         return 0;
    2115                 :             : #endif
    2116                 :           0 : }
    2117                 :             : 
    2118                 :             : void
    2119                 :           0 : FileWriteback(File file, pgoff_t offset, pgoff_t nbytes, uint32 wait_event_info)
    2120                 :             : {
    2121                 :           0 :         int                     returnCode;
    2122                 :             : 
    2123         [ #  # ]:           0 :         Assert(FileIsValid(file));
    2124                 :             : 
    2125                 :             :         DO_DB(elog(LOG, "FileWriteback: %d (%s) " INT64_FORMAT " " INT64_FORMAT,
    2126                 :             :                            file, VfdCache[file].fileName,
    2127                 :             :                            (int64) offset, (int64) nbytes));
    2128                 :             : 
    2129         [ #  # ]:           0 :         if (nbytes <= 0)
    2130                 :           0 :                 return;
    2131                 :             : 
    2132         [ #  # ]:           0 :         if (VfdCache[file].fileFlags & PG_O_DIRECT)
    2133                 :           0 :                 return;
    2134                 :             : 
    2135                 :           0 :         returnCode = FileAccess(file);
    2136         [ #  # ]:           0 :         if (returnCode < 0)
    2137                 :           0 :                 return;
    2138                 :             : 
    2139                 :           0 :         pgstat_report_wait_start(wait_event_info);
    2140                 :           0 :         pg_flush_data(VfdCache[file].fd, offset, nbytes);
    2141                 :           0 :         pgstat_report_wait_end();
    2142         [ #  # ]:           0 : }
    2143                 :             : 
    2144                 :             : ssize_t
    2145                 :       34735 : FileReadV(File file, const struct iovec *iov, int iovcnt, pgoff_t offset,
    2146                 :             :                   uint32 wait_event_info)
    2147                 :             : {
    2148                 :       34735 :         ssize_t         returnCode;
    2149                 :       34735 :         Vfd                *vfdP;
    2150                 :             : 
    2151         [ +  - ]:       34735 :         Assert(FileIsValid(file));
    2152                 :             : 
    2153                 :             :         DO_DB(elog(LOG, "FileReadV: %d (%s) " INT64_FORMAT " %d",
    2154                 :             :                            file, VfdCache[file].fileName,
    2155                 :             :                            (int64) offset,
    2156                 :             :                            iovcnt));
    2157                 :             : 
    2158                 :       34735 :         returnCode = FileAccess(file);
    2159         [ +  - ]:       34735 :         if (returnCode < 0)
    2160                 :           0 :                 return returnCode;
    2161                 :             : 
    2162                 :       34735 :         vfdP = &VfdCache[file];
    2163                 :             : 
    2164                 :             : retry:
    2165                 :       34735 :         pgstat_report_wait_start(wait_event_info);
    2166                 :       34735 :         returnCode = pg_preadv(vfdP->fd, iov, iovcnt, offset);
    2167                 :       34735 :         pgstat_report_wait_end();
    2168                 :             : 
    2169         [ +  - ]:       34735 :         if (returnCode < 0)
    2170                 :             :         {
    2171                 :             :                 /*
    2172                 :             :                  * Windows may run out of kernel buffers and return "Insufficient
    2173                 :             :                  * system resources" error.  Wait a bit and retry to solve it.
    2174                 :             :                  *
    2175                 :             :                  * It is rumored that EINTR is also possible on some Unix filesystems,
    2176                 :             :                  * in which case immediate retry is indicated.
    2177                 :             :                  */
    2178                 :             : #ifdef WIN32
    2179                 :             :                 DWORD           error = GetLastError();
    2180                 :             : 
    2181                 :             :                 switch (error)
    2182                 :             :                 {
    2183                 :             :                         case ERROR_NO_SYSTEM_RESOURCES:
    2184                 :             :                                 pg_usleep(1000L);
    2185                 :             :                                 errno = EINTR;
    2186                 :             :                                 break;
    2187                 :             :                         default:
    2188                 :             :                                 _dosmaperr(error);
    2189                 :             :                                 break;
    2190                 :             :                 }
    2191                 :             : #endif
    2192                 :             :                 /* OK to retry if interrupted */
    2193         [ #  # ]:           0 :                 if (errno == EINTR)
    2194                 :           0 :                         goto retry;
    2195                 :           0 :         }
    2196                 :             : 
    2197                 :       34735 :         return returnCode;
    2198                 :       34735 : }
    2199                 :             : 
    2200                 :             : int
    2201                 :        7052 : FileStartReadV(PgAioHandle *ioh, File file,
    2202                 :             :                            int iovcnt, pgoff_t offset,
    2203                 :             :                            uint32 wait_event_info)
    2204                 :             : {
    2205                 :        7052 :         int                     returnCode;
    2206                 :        7052 :         Vfd                *vfdP;
    2207                 :             : 
    2208         [ +  - ]:        7052 :         Assert(FileIsValid(file));
    2209                 :             : 
    2210                 :             :         DO_DB(elog(LOG, "FileStartReadV: %d (%s) " INT64_FORMAT " %d",
    2211                 :             :                            file, VfdCache[file].fileName,
    2212                 :             :                            (int64) offset,
    2213                 :             :                            iovcnt));
    2214                 :             : 
    2215                 :        7052 :         returnCode = FileAccess(file);
    2216         [ +  - ]:        7052 :         if (returnCode < 0)
    2217                 :           0 :                 return returnCode;
    2218                 :             : 
    2219                 :        7052 :         vfdP = &VfdCache[file];
    2220                 :             : 
    2221                 :        7052 :         pgaio_io_start_readv(ioh, vfdP->fd, iovcnt, offset);
    2222                 :             : 
    2223                 :        7052 :         return 0;
    2224                 :        7052 : }
    2225                 :             : 
    2226                 :             : ssize_t
    2227                 :       53924 : FileWriteV(File file, const struct iovec *iov, int iovcnt, pgoff_t offset,
    2228                 :             :                    uint32 wait_event_info)
    2229                 :             : {
    2230                 :       53924 :         ssize_t         returnCode;
    2231                 :       53924 :         Vfd                *vfdP;
    2232                 :             : 
    2233         [ +  - ]:       53924 :         Assert(FileIsValid(file));
    2234                 :             : 
    2235                 :             :         DO_DB(elog(LOG, "FileWriteV: %d (%s) " INT64_FORMAT " %d",
    2236                 :             :                            file, VfdCache[file].fileName,
    2237                 :             :                            (int64) offset,
    2238                 :             :                            iovcnt));
    2239                 :             : 
    2240                 :       53924 :         returnCode = FileAccess(file);
    2241         [ +  - ]:       53924 :         if (returnCode < 0)
    2242                 :           0 :                 return returnCode;
    2243                 :             : 
    2244                 :       53924 :         vfdP = &VfdCache[file];
    2245                 :             : 
    2246                 :             :         /*
    2247                 :             :          * If enforcing temp_file_limit and it's a temp file, check to see if the
    2248                 :             :          * write would overrun temp_file_limit, and throw error if so.  Note: it's
    2249                 :             :          * really a modularity violation to throw error here; we should set errno
    2250                 :             :          * and return -1.  However, there's no way to report a suitable error
    2251                 :             :          * message if we do that.  All current callers would just throw error
    2252                 :             :          * immediately anyway, so this is safe at present.
    2253                 :             :          */
    2254   [ -  +  #  # ]:       53924 :         if (temp_file_limit >= 0 && (vfdP->fdstate & FD_TEMP_FILE_LIMIT))
    2255                 :             :         {
    2256                 :           0 :                 pgoff_t         past_write = offset;
    2257                 :             : 
    2258         [ #  # ]:           0 :                 for (int i = 0; i < iovcnt; ++i)
    2259                 :           0 :                         past_write += iov[i].iov_len;
    2260                 :             : 
    2261         [ #  # ]:           0 :                 if (past_write > vfdP->fileSize)
    2262                 :             :                 {
    2263                 :           0 :                         uint64          newTotal = temporary_files_size;
    2264                 :             : 
    2265                 :           0 :                         newTotal += past_write - vfdP->fileSize;
    2266         [ #  # ]:           0 :                         if (newTotal > (uint64) temp_file_limit * (uint64) 1024)
    2267   [ #  #  #  # ]:           0 :                                 ereport(ERROR,
    2268                 :             :                                                 (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
    2269                 :             :                                                  errmsg("temporary file size exceeds \"temp_file_limit\" (%dkB)",
    2270                 :             :                                                                 temp_file_limit)));
    2271                 :           0 :                 }
    2272                 :           0 :         }
    2273                 :             : 
    2274                 :             : retry:
    2275                 :       53924 :         pgstat_report_wait_start(wait_event_info);
    2276                 :       53924 :         returnCode = pg_pwritev(vfdP->fd, iov, iovcnt, offset);
    2277                 :       53924 :         pgstat_report_wait_end();
    2278                 :             : 
    2279         [ -  + ]:       53924 :         if (returnCode >= 0)
    2280                 :             :         {
    2281                 :             :                 /*
    2282                 :             :                  * Some callers expect short writes to set errno, and traditionally we
    2283                 :             :                  * have assumed that they imply disk space shortage.  We don't want to
    2284                 :             :                  * waste CPU cycles adding up the total size here, so we'll just set
    2285                 :             :                  * it for all successful writes in case such a caller determines that
    2286                 :             :                  * the write was short and ereports "%m".
    2287                 :             :                  */
    2288                 :       53924 :                 errno = ENOSPC;
    2289                 :             : 
    2290                 :             :                 /*
    2291                 :             :                  * Maintain fileSize and temporary_files_size if it's a temp file.
    2292                 :             :                  */
    2293         [ +  + ]:       53924 :                 if (vfdP->fdstate & FD_TEMP_FILE_LIMIT)
    2294                 :             :                 {
    2295                 :       35453 :                         pgoff_t         past_write = offset + returnCode;
    2296                 :             : 
    2297         [ +  + ]:       35453 :                         if (past_write > vfdP->fileSize)
    2298                 :             :                         {
    2299                 :       30092 :                                 temporary_files_size += past_write - vfdP->fileSize;
    2300                 :       30092 :                                 vfdP->fileSize = past_write;
    2301                 :       30092 :                         }
    2302                 :       35453 :                 }
    2303                 :       53924 :         }
    2304                 :             :         else
    2305                 :             :         {
    2306                 :             :                 /*
    2307                 :             :                  * See comments in FileReadV()
    2308                 :             :                  */
    2309                 :             : #ifdef WIN32
    2310                 :             :                 DWORD           error = GetLastError();
    2311                 :             : 
    2312                 :             :                 switch (error)
    2313                 :             :                 {
    2314                 :             :                         case ERROR_NO_SYSTEM_RESOURCES:
    2315                 :             :                                 pg_usleep(1000L);
    2316                 :             :                                 errno = EINTR;
    2317                 :             :                                 break;
    2318                 :             :                         default:
    2319                 :             :                                 _dosmaperr(error);
    2320                 :             :                                 break;
    2321                 :             :                 }
    2322                 :             : #endif
    2323                 :             :                 /* OK to retry if interrupted */
    2324         [ #  # ]:           0 :                 if (errno == EINTR)
    2325                 :           0 :                         goto retry;
    2326                 :             :         }
    2327                 :             : 
    2328                 :       53924 :         return returnCode;
    2329                 :       53924 : }
    2330                 :             : 
    2331                 :             : int
    2332                 :           8 : FileSync(File file, uint32 wait_event_info)
    2333                 :             : {
    2334                 :           8 :         int                     returnCode;
    2335                 :             : 
    2336         [ +  - ]:           8 :         Assert(FileIsValid(file));
    2337                 :             : 
    2338                 :             :         DO_DB(elog(LOG, "FileSync: %d (%s)",
    2339                 :             :                            file, VfdCache[file].fileName));
    2340                 :             : 
    2341                 :           8 :         returnCode = FileAccess(file);
    2342         [ +  - ]:           8 :         if (returnCode < 0)
    2343                 :           0 :                 return returnCode;
    2344                 :             : 
    2345                 :           8 :         pgstat_report_wait_start(wait_event_info);
    2346                 :           8 :         returnCode = pg_fsync(VfdCache[file].fd);
    2347                 :           8 :         pgstat_report_wait_end();
    2348                 :             : 
    2349                 :           8 :         return returnCode;
    2350                 :           8 : }
    2351                 :             : 
    2352                 :             : /*
    2353                 :             :  * Zero a region of the file.
    2354                 :             :  *
    2355                 :             :  * Returns 0 on success, -1 otherwise. In the latter case errno is set to the
    2356                 :             :  * appropriate error.
    2357                 :             :  */
    2358                 :             : int
    2359                 :       27454 : FileZero(File file, pgoff_t offset, pgoff_t amount, uint32 wait_event_info)
    2360                 :             : {
    2361                 :       27454 :         int                     returnCode;
    2362                 :       27454 :         ssize_t         written;
    2363                 :             : 
    2364         [ +  - ]:       27454 :         Assert(FileIsValid(file));
    2365                 :             : 
    2366                 :             :         DO_DB(elog(LOG, "FileZero: %d (%s) " INT64_FORMAT " " INT64_FORMAT,
    2367                 :             :                            file, VfdCache[file].fileName,
    2368                 :             :                            (int64) offset, (int64) amount));
    2369                 :             : 
    2370                 :       27454 :         returnCode = FileAccess(file);
    2371         [ +  - ]:       27454 :         if (returnCode < 0)
    2372                 :           0 :                 return returnCode;
    2373                 :             : 
    2374                 :       27454 :         pgstat_report_wait_start(wait_event_info);
    2375                 :       27454 :         written = pg_pwrite_zeros(VfdCache[file].fd, amount, offset);
    2376                 :       27454 :         pgstat_report_wait_end();
    2377                 :             : 
    2378         [ +  - ]:       27454 :         if (written < 0)
    2379                 :           0 :                 return -1;
    2380         [ -  + ]:       27454 :         else if (written != amount)
    2381                 :             :         {
    2382                 :             :                 /* if errno is unset, assume problem is no disk space */
    2383         [ #  # ]:           0 :                 if (errno == 0)
    2384                 :           0 :                         errno = ENOSPC;
    2385                 :           0 :                 return -1;
    2386                 :             :         }
    2387                 :             : 
    2388                 :       27454 :         return 0;
    2389                 :       27454 : }
    2390                 :             : 
    2391                 :             : /*
    2392                 :             :  * Try to reserve file space with posix_fallocate(). If posix_fallocate() is
    2393                 :             :  * not implemented on the operating system or fails with EINVAL / EOPNOTSUPP,
    2394                 :             :  * use FileZero() instead.
    2395                 :             :  *
    2396                 :             :  * Note that at least glibc() implements posix_fallocate() in userspace if not
    2397                 :             :  * implemented by the filesystem. That's not the case for all environments
    2398                 :             :  * though.
    2399                 :             :  *
    2400                 :             :  * Returns 0 on success, -1 otherwise. In the latter case errno is set to the
    2401                 :             :  * appropriate error.
    2402                 :             :  */
    2403                 :             : int
    2404                 :         131 : FileFallocate(File file, pgoff_t offset, pgoff_t amount, uint32 wait_event_info)
    2405                 :             : {
    2406                 :             : #ifdef HAVE_POSIX_FALLOCATE
    2407                 :             :         int                     returnCode;
    2408                 :             : 
    2409                 :             :         Assert(FileIsValid(file));
    2410                 :             : 
    2411                 :             :         DO_DB(elog(LOG, "FileFallocate: %d (%s) " INT64_FORMAT " " INT64_FORMAT,
    2412                 :             :                            file, VfdCache[file].fileName,
    2413                 :             :                            (int64) offset, (int64) amount));
    2414                 :             : 
    2415                 :             :         returnCode = FileAccess(file);
    2416                 :             :         if (returnCode < 0)
    2417                 :             :                 return -1;
    2418                 :             : 
    2419                 :             : retry:
    2420                 :             :         pgstat_report_wait_start(wait_event_info);
    2421                 :             :         returnCode = posix_fallocate(VfdCache[file].fd, offset, amount);
    2422                 :             :         pgstat_report_wait_end();
    2423                 :             : 
    2424                 :             :         if (returnCode == 0)
    2425                 :             :                 return 0;
    2426                 :             :         else if (returnCode == EINTR)
    2427                 :             :                 goto retry;
    2428                 :             : 
    2429                 :             :         /* for compatibility with %m printing etc */
    2430                 :             :         errno = returnCode;
    2431                 :             : 
    2432                 :             :         /*
    2433                 :             :          * Return in cases of a "real" failure, if fallocate is not supported,
    2434                 :             :          * fall through to the FileZero() backed implementation.
    2435                 :             :          */
    2436                 :             :         if (returnCode != EINVAL && returnCode != EOPNOTSUPP)
    2437                 :             :                 return -1;
    2438                 :             : #endif
    2439                 :             : 
    2440                 :         131 :         return FileZero(file, offset, amount, wait_event_info);
    2441                 :             : }
    2442                 :             : 
    2443                 :             : pgoff_t
    2444                 :      618184 : FileSize(File file)
    2445                 :             : {
    2446         [ +  - ]:      618184 :         Assert(FileIsValid(file));
    2447                 :             : 
    2448                 :             :         DO_DB(elog(LOG, "FileSize %d (%s)",
    2449                 :             :                            file, VfdCache[file].fileName));
    2450                 :             : 
    2451         [ +  + ]:      618184 :         if (FileIsNotOpen(file))
    2452                 :             :         {
    2453         [ +  - ]:          17 :                 if (FileAccess(file) < 0)
    2454                 :           0 :                         return (pgoff_t) -1;
    2455                 :          17 :         }
    2456                 :             : 
    2457                 :      618184 :         return lseek(VfdCache[file].fd, 0, SEEK_END);
    2458                 :      618184 : }
    2459                 :             : 
    2460                 :             : int
    2461                 :         102 : FileTruncate(File file, pgoff_t offset, uint32 wait_event_info)
    2462                 :             : {
    2463                 :         102 :         int                     returnCode;
    2464                 :             : 
    2465         [ +  - ]:         102 :         Assert(FileIsValid(file));
    2466                 :             : 
    2467                 :             :         DO_DB(elog(LOG, "FileTruncate %d (%s)",
    2468                 :             :                            file, VfdCache[file].fileName));
    2469                 :             : 
    2470                 :         102 :         returnCode = FileAccess(file);
    2471         [ +  - ]:         102 :         if (returnCode < 0)
    2472                 :           0 :                 return returnCode;
    2473                 :             : 
    2474                 :         102 :         pgstat_report_wait_start(wait_event_info);
    2475                 :         102 :         returnCode = pg_ftruncate(VfdCache[file].fd, offset);
    2476                 :         102 :         pgstat_report_wait_end();
    2477                 :             : 
    2478   [ +  -  +  - ]:         102 :         if (returnCode == 0 && VfdCache[file].fileSize > offset)
    2479                 :             :         {
    2480                 :             :                 /* adjust our state for truncation of a temp file */
    2481         [ #  # ]:           0 :                 Assert(VfdCache[file].fdstate & FD_TEMP_FILE_LIMIT);
    2482                 :           0 :                 temporary_files_size -= VfdCache[file].fileSize - offset;
    2483                 :           0 :                 VfdCache[file].fileSize = offset;
    2484                 :           0 :         }
    2485                 :             : 
    2486                 :         102 :         return returnCode;
    2487                 :         102 : }
    2488                 :             : 
    2489                 :             : /*
    2490                 :             :  * Return the pathname associated with an open file.
    2491                 :             :  *
    2492                 :             :  * The returned string points to an internal buffer, which is valid until
    2493                 :             :  * the file is closed.
    2494                 :             :  */
    2495                 :             : char *
    2496                 :           0 : FilePathName(File file)
    2497                 :             : {
    2498         [ #  # ]:           0 :         Assert(FileIsValid(file));
    2499                 :             : 
    2500                 :           0 :         return VfdCache[file].fileName;
    2501                 :             : }
    2502                 :             : 
    2503                 :             : /*
    2504                 :             :  * Return the raw file descriptor of an opened file.
    2505                 :             :  *
    2506                 :             :  * The returned file descriptor will be valid until the file is closed, but
    2507                 :             :  * there are a lot of things that can make that happen.  So the caller should
    2508                 :             :  * be careful not to do much of anything else before it finishes using the
    2509                 :             :  * returned file descriptor.
    2510                 :             :  */
    2511                 :             : int
    2512                 :         287 : FileGetRawDesc(File file)
    2513                 :             : {
    2514                 :         287 :         int                     returnCode;
    2515                 :             : 
    2516                 :         287 :         returnCode = FileAccess(file);
    2517         [ +  - ]:         287 :         if (returnCode < 0)
    2518                 :           0 :                 return returnCode;
    2519                 :             : 
    2520         [ +  - ]:         287 :         Assert(FileIsValid(file));
    2521                 :         287 :         return VfdCache[file].fd;
    2522                 :         287 : }
    2523                 :             : 
    2524                 :             : /*
    2525                 :             :  * FileGetRawFlags - returns the file flags on open(2)
    2526                 :             :  */
    2527                 :             : int
    2528                 :           0 : FileGetRawFlags(File file)
    2529                 :             : {
    2530         [ #  # ]:           0 :         Assert(FileIsValid(file));
    2531                 :           0 :         return VfdCache[file].fileFlags;
    2532                 :             : }
    2533                 :             : 
    2534                 :             : /*
    2535                 :             :  * FileGetRawMode - returns the mode bitmask passed to open(2)
    2536                 :             :  */
    2537                 :             : mode_t
    2538                 :           0 : FileGetRawMode(File file)
    2539                 :             : {
    2540         [ #  # ]:           0 :         Assert(FileIsValid(file));
    2541                 :           0 :         return VfdCache[file].fileMode;
    2542                 :             : }
    2543                 :             : 
    2544                 :             : /*
    2545                 :             :  * Make room for another allocatedDescs[] array entry if needed and possible.
    2546                 :             :  * Returns true if an array element is available.
    2547                 :             :  */
    2548                 :             : static bool
    2549                 :       11741 : reserveAllocatedDesc(void)
    2550                 :             : {
    2551                 :       11741 :         AllocateDesc *newDescs;
    2552                 :       11741 :         int                     newMax;
    2553                 :             : 
    2554                 :             :         /* Quick out if array already has a free slot. */
    2555         [ +  + ]:       11741 :         if (numAllocatedDescs < maxAllocatedDescs)
    2556                 :       11735 :                 return true;
    2557                 :             : 
    2558                 :             :         /*
    2559                 :             :          * If the array hasn't yet been created in the current process, initialize
    2560                 :             :          * it with FD_MINFREE / 3 elements.  In many scenarios this is as many as
    2561                 :             :          * we will ever need, anyway.  We don't want to look at max_safe_fds
    2562                 :             :          * immediately because set_max_safe_fds() may not have run yet.
    2563                 :             :          */
    2564         [ -  + ]:           6 :         if (allocatedDescs == NULL)
    2565                 :             :         {
    2566                 :           6 :                 newMax = FD_MINFREE / 3;
    2567                 :           6 :                 newDescs = (AllocateDesc *) malloc(newMax * sizeof(AllocateDesc));
    2568                 :             :                 /* Out of memory already?  Treat as fatal error. */
    2569         [ +  - ]:           6 :                 if (newDescs == NULL)
    2570   [ #  #  #  # ]:           0 :                         ereport(ERROR,
    2571                 :             :                                         (errcode(ERRCODE_OUT_OF_MEMORY),
    2572                 :             :                                          errmsg("out of memory")));
    2573                 :           6 :                 allocatedDescs = newDescs;
    2574                 :           6 :                 maxAllocatedDescs = newMax;
    2575                 :           6 :                 return true;
    2576                 :             :         }
    2577                 :             : 
    2578                 :             :         /*
    2579                 :             :          * Consider enlarging the array beyond the initial allocation used above.
    2580                 :             :          * By the time this happens, max_safe_fds should be known accurately.
    2581                 :             :          *
    2582                 :             :          * We mustn't let allocated descriptors hog all the available FDs, and in
    2583                 :             :          * practice we'd better leave a reasonable number of FDs for VFD use.  So
    2584                 :             :          * set the maximum to max_safe_fds / 3.  (This should certainly be at
    2585                 :             :          * least as large as the initial size, FD_MINFREE / 3, so we aren't
    2586                 :             :          * tightening the restriction here.)  Recall that "external" FDs are
    2587                 :             :          * allowed to consume another third of max_safe_fds.
    2588                 :             :          */
    2589                 :           0 :         newMax = max_safe_fds / 3;
    2590         [ #  # ]:           0 :         if (newMax > maxAllocatedDescs)
    2591                 :             :         {
    2592                 :           0 :                 newDescs = (AllocateDesc *) realloc(allocatedDescs,
    2593                 :           0 :                                                                                         newMax * sizeof(AllocateDesc));
    2594                 :             :                 /* Treat out-of-memory as a non-fatal error. */
    2595         [ #  # ]:           0 :                 if (newDescs == NULL)
    2596                 :           0 :                         return false;
    2597                 :           0 :                 allocatedDescs = newDescs;
    2598                 :           0 :                 maxAllocatedDescs = newMax;
    2599                 :           0 :                 return true;
    2600                 :             :         }
    2601                 :             : 
    2602                 :             :         /* Can't enlarge allocatedDescs[] any more. */
    2603                 :           0 :         return false;
    2604                 :       11741 : }
    2605                 :             : 
    2606                 :             : /*
    2607                 :             :  * Routines that want to use stdio (ie, FILE*) should use AllocateFile
    2608                 :             :  * rather than plain fopen().  This lets fd.c deal with freeing FDs if
    2609                 :             :  * necessary to open the file.  When done, call FreeFile rather than fclose.
    2610                 :             :  *
    2611                 :             :  * Note that files that will be open for any significant length of time
    2612                 :             :  * should NOT be handled this way, since they cannot share kernel file
    2613                 :             :  * descriptors with other files; there is grave risk of running out of FDs
    2614                 :             :  * if anyone locks down too many FDs.  Most callers of this routine are
    2615                 :             :  * simply reading a config file that they will read and close immediately.
    2616                 :             :  *
    2617                 :             :  * fd.c will automatically close all files opened with AllocateFile at
    2618                 :             :  * transaction commit or abort; this prevents FD leakage if a routine
    2619                 :             :  * that calls AllocateFile is terminated prematurely by ereport(ERROR).
    2620                 :             :  *
    2621                 :             :  * Ideally this should be the *only* direct call of fopen() in the backend.
    2622                 :             :  */
    2623                 :             : FILE *
    2624                 :        4275 : AllocateFile(const char *name, const char *mode)
    2625                 :             : {
    2626                 :        4275 :         FILE       *file;
    2627                 :             : 
    2628                 :             :         DO_DB(elog(LOG, "AllocateFile: Allocated %d (%s)",
    2629                 :             :                            numAllocatedDescs, name));
    2630                 :             : 
    2631                 :             :         /* Can we allocate another non-virtual FD? */
    2632         [ +  - ]:        4275 :         if (!reserveAllocatedDesc())
    2633   [ #  #  #  # ]:           0 :                 ereport(ERROR,
    2634                 :             :                                 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
    2635                 :             :                                  errmsg("exceeded maxAllocatedDescs (%d) while trying to open file \"%s\"",
    2636                 :             :                                                 maxAllocatedDescs, name)));
    2637                 :             : 
    2638                 :             :         /* Close excess kernel FDs. */
    2639                 :        4275 :         ReleaseLruFiles();
    2640                 :             : 
    2641                 :             : TryAgain:
    2642         [ +  + ]:        4275 :         if ((file = fopen(name, mode)) != NULL)
    2643                 :             :         {
    2644                 :        4037 :                 AllocateDesc *desc = &allocatedDescs[numAllocatedDescs];
    2645                 :             : 
    2646                 :        4037 :                 desc->kind = AllocateDescFile;
    2647                 :        4037 :                 desc->desc.file = file;
    2648                 :        4037 :                 desc->create_subid = GetCurrentSubTransactionId();
    2649                 :        4037 :                 numAllocatedDescs++;
    2650                 :        4037 :                 return desc->desc.file;
    2651                 :        4037 :         }
    2652                 :             : 
    2653   [ +  -  -  + ]:         238 :         if (errno == EMFILE || errno == ENFILE)
    2654                 :             :         {
    2655                 :           0 :                 int                     save_errno = errno;
    2656                 :             : 
    2657   [ #  #  #  # ]:           0 :                 ereport(LOG,
    2658                 :             :                                 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
    2659                 :             :                                  errmsg("out of file descriptors: %m; release and retry")));
    2660                 :           0 :                 errno = 0;
    2661         [ #  # ]:           0 :                 if (ReleaseLruFile())
    2662                 :           0 :                         goto TryAgain;
    2663                 :           0 :                 errno = save_errno;
    2664      [ #  #  # ]:           0 :         }
    2665                 :             : 
    2666                 :         238 :         return NULL;
    2667                 :        4275 : }
    2668                 :             : 
    2669                 :             : /*
    2670                 :             :  * Open a file with OpenTransientFilePerm() and pass default file mode for
    2671                 :             :  * the fileMode parameter.
    2672                 :             :  */
    2673                 :             : int
    2674                 :        5429 : OpenTransientFile(const char *fileName, int fileFlags)
    2675                 :             : {
    2676                 :        5429 :         return OpenTransientFilePerm(fileName, fileFlags, pg_file_create_mode);
    2677                 :             : }
    2678                 :             : 
    2679                 :             : /*
    2680                 :             :  * Like AllocateFile, but returns an unbuffered fd like open(2)
    2681                 :             :  */
    2682                 :             : int
    2683                 :        5431 : OpenTransientFilePerm(const char *fileName, int fileFlags, mode_t fileMode)
    2684                 :             : {
    2685                 :        5431 :         int                     fd;
    2686                 :             : 
    2687                 :             :         DO_DB(elog(LOG, "OpenTransientFile: Allocated %d (%s)",
    2688                 :             :                            numAllocatedDescs, fileName));
    2689                 :             : 
    2690                 :             :         /* Can we allocate another non-virtual FD? */
    2691         [ +  - ]:        5431 :         if (!reserveAllocatedDesc())
    2692   [ #  #  #  # ]:           0 :                 ereport(ERROR,
    2693                 :             :                                 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
    2694                 :             :                                  errmsg("exceeded maxAllocatedDescs (%d) while trying to open file \"%s\"",
    2695                 :             :                                                 maxAllocatedDescs, fileName)));
    2696                 :             : 
    2697                 :             :         /* Close excess kernel FDs. */
    2698                 :        5431 :         ReleaseLruFiles();
    2699                 :             : 
    2700                 :        5431 :         fd = BasicOpenFilePerm(fileName, fileFlags, fileMode);
    2701                 :             : 
    2702         [ +  + ]:        5431 :         if (fd >= 0)
    2703                 :             :         {
    2704                 :        5403 :                 AllocateDesc *desc = &allocatedDescs[numAllocatedDescs];
    2705                 :             : 
    2706                 :        5403 :                 desc->kind = AllocateDescRawFD;
    2707                 :        5403 :                 desc->desc.fd = fd;
    2708                 :        5403 :                 desc->create_subid = GetCurrentSubTransactionId();
    2709                 :        5403 :                 numAllocatedDescs++;
    2710                 :             : 
    2711                 :        5403 :                 return fd;
    2712                 :        5403 :         }
    2713                 :             : 
    2714                 :          28 :         return -1;                                      /* failure */
    2715                 :        5431 : }
    2716                 :             : 
    2717                 :             : /*
    2718                 :             :  * Routines that want to initiate a pipe stream should use OpenPipeStream
    2719                 :             :  * rather than plain popen().  This lets fd.c deal with freeing FDs if
    2720                 :             :  * necessary.  When done, call ClosePipeStream rather than pclose.
    2721                 :             :  *
    2722                 :             :  * This function also ensures that the popen'd program is run with default
    2723                 :             :  * SIGPIPE processing, rather than the SIG_IGN setting the backend normally
    2724                 :             :  * uses.  This ensures desirable response to, eg, closing a read pipe early.
    2725                 :             :  */
    2726                 :             : FILE *
    2727                 :           1 : OpenPipeStream(const char *command, const char *mode)
    2728                 :             : {
    2729                 :           1 :         FILE       *file;
    2730                 :           1 :         int                     save_errno;
    2731                 :             : 
    2732                 :             :         DO_DB(elog(LOG, "OpenPipeStream: Allocated %d (%s)",
    2733                 :             :                            numAllocatedDescs, command));
    2734                 :             : 
    2735                 :             :         /* Can we allocate another non-virtual FD? */
    2736         [ +  - ]:           1 :         if (!reserveAllocatedDesc())
    2737   [ #  #  #  # ]:           0 :                 ereport(ERROR,
    2738                 :             :                                 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
    2739                 :             :                                  errmsg("exceeded maxAllocatedDescs (%d) while trying to execute command \"%s\"",
    2740                 :             :                                                 maxAllocatedDescs, command)));
    2741                 :             : 
    2742                 :             :         /* Close excess kernel FDs. */
    2743                 :           1 :         ReleaseLruFiles();
    2744                 :             : 
    2745                 :             : TryAgain:
    2746                 :           1 :         fflush(NULL);
    2747                 :           1 :         pqsignal(SIGPIPE, SIG_DFL);
    2748                 :           1 :         errno = 0;
    2749                 :           1 :         file = popen(command, mode);
    2750                 :           1 :         save_errno = errno;
    2751                 :           1 :         pqsignal(SIGPIPE, SIG_IGN);
    2752                 :           1 :         errno = save_errno;
    2753         [ +  - ]:           1 :         if (file != NULL)
    2754                 :             :         {
    2755                 :           1 :                 AllocateDesc *desc = &allocatedDescs[numAllocatedDescs];
    2756                 :             : 
    2757                 :           1 :                 desc->kind = AllocateDescPipe;
    2758                 :           1 :                 desc->desc.file = file;
    2759                 :           1 :                 desc->create_subid = GetCurrentSubTransactionId();
    2760                 :           1 :                 numAllocatedDescs++;
    2761                 :           1 :                 return desc->desc.file;
    2762                 :           1 :         }
    2763                 :             : 
    2764   [ #  #  #  # ]:           0 :         if (errno == EMFILE || errno == ENFILE)
    2765                 :             :         {
    2766   [ #  #  #  # ]:           0 :                 ereport(LOG,
    2767                 :             :                                 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
    2768                 :             :                                  errmsg("out of file descriptors: %m; release and retry")));
    2769         [ #  # ]:           0 :                 if (ReleaseLruFile())
    2770                 :           0 :                         goto TryAgain;
    2771                 :           0 :                 errno = save_errno;
    2772                 :           0 :         }
    2773                 :             : 
    2774                 :           0 :         return NULL;
    2775                 :           1 : }
    2776                 :             : 
    2777                 :             : /*
    2778                 :             :  * Free an AllocateDesc of any type.
    2779                 :             :  *
    2780                 :             :  * The argument *must* point into the allocatedDescs[] array.
    2781                 :             :  */
    2782                 :             : static int
    2783                 :       11469 : FreeDesc(AllocateDesc *desc)
    2784                 :             : {
    2785                 :       11469 :         int                     result;
    2786                 :             : 
    2787                 :             :         /* Close the underlying object */
    2788   [ +  +  +  +  :       11469 :         switch (desc->kind)
                      - ]
    2789                 :             :         {
    2790                 :             :                 case AllocateDescFile:
    2791                 :        4037 :                         result = fclose(desc->desc.file);
    2792                 :        4037 :                         break;
    2793                 :             :                 case AllocateDescPipe:
    2794                 :           1 :                         result = pclose(desc->desc.file);
    2795                 :           1 :                         break;
    2796                 :             :                 case AllocateDescDir:
    2797                 :        2028 :                         result = closedir(desc->desc.dir);
    2798                 :        2028 :                         break;
    2799                 :             :                 case AllocateDescRawFD:
    2800                 :        5403 :                         pgaio_closing_fd(desc->desc.fd);
    2801                 :        5403 :                         result = close(desc->desc.fd);
    2802                 :        5403 :                         break;
    2803                 :             :                 default:
    2804   [ #  #  #  # ]:           0 :                         elog(ERROR, "AllocateDesc kind not recognized");
    2805                 :           0 :                         result = 0;                     /* keep compiler quiet */
    2806                 :           0 :                         break;
    2807                 :             :         }
    2808                 :             : 
    2809                 :             :         /* Compact storage in the allocatedDescs array */
    2810                 :       11469 :         numAllocatedDescs--;
    2811                 :       11469 :         *desc = allocatedDescs[numAllocatedDescs];
    2812                 :             : 
    2813                 :       22938 :         return result;
    2814                 :       11469 : }
    2815                 :             : 
    2816                 :             : /*
    2817                 :             :  * Close a file returned by AllocateFile.
    2818                 :             :  *
    2819                 :             :  * Note we do not check fclose's return value --- it is up to the caller
    2820                 :             :  * to handle close errors.
    2821                 :             :  */
    2822                 :             : int
    2823                 :        4033 : FreeFile(FILE *file)
    2824                 :             : {
    2825                 :        4033 :         int                     i;
    2826                 :             : 
    2827                 :             :         DO_DB(elog(LOG, "FreeFile: Allocated %d", numAllocatedDescs));
    2828                 :             : 
    2829                 :             :         /* Remove file from list of allocated files, if it's present */
    2830         [ +  - ]:        4033 :         for (i = numAllocatedDescs; --i >= 0;)
    2831                 :             :         {
    2832                 :        4033 :                 AllocateDesc *desc = &allocatedDescs[i];
    2833                 :             : 
    2834   [ +  -  -  + ]:        4033 :                 if (desc->kind == AllocateDescFile && desc->desc.file == file)
    2835                 :        4033 :                         return FreeDesc(desc);
    2836         [ +  - ]:        4033 :         }
    2837                 :             : 
    2838                 :             :         /* Only get here if someone passes us a file not in allocatedDescs */
    2839   [ #  #  #  # ]:           0 :         elog(WARNING, "file passed to FreeFile was not obtained from AllocateFile");
    2840                 :             : 
    2841                 :           0 :         return fclose(file);
    2842                 :        4033 : }
    2843                 :             : 
    2844                 :             : /*
    2845                 :             :  * Close a file returned by OpenTransientFile.
    2846                 :             :  *
    2847                 :             :  * Note we do not check close's return value --- it is up to the caller
    2848                 :             :  * to handle close errors.
    2849                 :             :  */
    2850                 :             : int
    2851                 :        5403 : CloseTransientFile(int fd)
    2852                 :             : {
    2853                 :        5403 :         int                     i;
    2854                 :             : 
    2855                 :             :         DO_DB(elog(LOG, "CloseTransientFile: Allocated %d", numAllocatedDescs));
    2856                 :             : 
    2857                 :             :         /* Remove fd from list of allocated files, if it's present */
    2858         [ +  - ]:        5403 :         for (i = numAllocatedDescs; --i >= 0;)
    2859                 :             :         {
    2860                 :        5403 :                 AllocateDesc *desc = &allocatedDescs[i];
    2861                 :             : 
    2862   [ +  -  -  + ]:        5403 :                 if (desc->kind == AllocateDescRawFD && desc->desc.fd == fd)
    2863                 :        5403 :                         return FreeDesc(desc);
    2864         [ +  - ]:        5403 :         }
    2865                 :             : 
    2866                 :             :         /* Only get here if someone passes us a file not in allocatedDescs */
    2867   [ #  #  #  # ]:           0 :         elog(WARNING, "fd passed to CloseTransientFile was not obtained from OpenTransientFile");
    2868                 :             : 
    2869                 :           0 :         pgaio_closing_fd(fd);
    2870                 :             : 
    2871                 :           0 :         return close(fd);
    2872                 :        5403 : }
    2873                 :             : 
    2874                 :             : /*
    2875                 :             :  * Routines that want to use <dirent.h> (ie, DIR*) should use AllocateDir
    2876                 :             :  * rather than plain opendir().  This lets fd.c deal with freeing FDs if
    2877                 :             :  * necessary to open the directory, and with closing it after an elog.
    2878                 :             :  * When done, call FreeDir rather than closedir.
    2879                 :             :  *
    2880                 :             :  * Returns NULL, with errno set, on failure.  Note that failure detection
    2881                 :             :  * is commonly left to the following call of ReadDir or ReadDirExtended;
    2882                 :             :  * see the comments for ReadDir.
    2883                 :             :  *
    2884                 :             :  * Ideally this should be the *only* direct call of opendir() in the backend.
    2885                 :             :  */
    2886                 :             : DIR *
    2887                 :        2034 : AllocateDir(const char *dirname)
    2888                 :             : {
    2889                 :        2034 :         DIR                *dir;
    2890                 :             : 
    2891                 :             :         DO_DB(elog(LOG, "AllocateDir: Allocated %d (%s)",
    2892                 :             :                            numAllocatedDescs, dirname));
    2893                 :             : 
    2894                 :             :         /* Can we allocate another non-virtual FD? */
    2895         [ +  - ]:        2034 :         if (!reserveAllocatedDesc())
    2896   [ #  #  #  # ]:           0 :                 ereport(ERROR,
    2897                 :             :                                 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
    2898                 :             :                                  errmsg("exceeded maxAllocatedDescs (%d) while trying to open directory \"%s\"",
    2899                 :             :                                                 maxAllocatedDescs, dirname)));
    2900                 :             : 
    2901                 :             :         /* Close excess kernel FDs. */
    2902                 :        2034 :         ReleaseLruFiles();
    2903                 :             : 
    2904                 :             : TryAgain:
    2905         [ +  + ]:        2034 :         if ((dir = opendir(dirname)) != NULL)
    2906                 :             :         {
    2907                 :        2028 :                 AllocateDesc *desc = &allocatedDescs[numAllocatedDescs];
    2908                 :             : 
    2909                 :        2028 :                 desc->kind = AllocateDescDir;
    2910                 :        2028 :                 desc->desc.dir = dir;
    2911                 :        2028 :                 desc->create_subid = GetCurrentSubTransactionId();
    2912                 :        2028 :                 numAllocatedDescs++;
    2913                 :        2028 :                 return desc->desc.dir;
    2914                 :        2028 :         }
    2915                 :             : 
    2916   [ +  -  -  + ]:           6 :         if (errno == EMFILE || errno == ENFILE)
    2917                 :             :         {
    2918                 :           0 :                 int                     save_errno = errno;
    2919                 :             : 
    2920   [ #  #  #  # ]:           0 :                 ereport(LOG,
    2921                 :             :                                 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
    2922                 :             :                                  errmsg("out of file descriptors: %m; release and retry")));
    2923                 :           0 :                 errno = 0;
    2924         [ #  # ]:           0 :                 if (ReleaseLruFile())
    2925                 :           0 :                         goto TryAgain;
    2926                 :           0 :                 errno = save_errno;
    2927      [ #  #  # ]:           0 :         }
    2928                 :             : 
    2929                 :           6 :         return NULL;
    2930                 :        2034 : }
    2931                 :             : 
    2932                 :             : /*
    2933                 :             :  * Read a directory opened with AllocateDir, ereport'ing any error.
    2934                 :             :  *
    2935                 :             :  * This is easier to use than raw readdir() since it takes care of some
    2936                 :             :  * otherwise rather tedious and error-prone manipulation of errno.  Also,
    2937                 :             :  * if you are happy with a generic error message for AllocateDir failure,
    2938                 :             :  * you can just do
    2939                 :             :  *
    2940                 :             :  *              dir = AllocateDir(path);
    2941                 :             :  *              while ((dirent = ReadDir(dir, path)) != NULL)
    2942                 :             :  *                      process dirent;
    2943                 :             :  *              FreeDir(dir);
    2944                 :             :  *
    2945                 :             :  * since a NULL dir parameter is taken as indicating AllocateDir failed.
    2946                 :             :  * (Make sure errno isn't changed between AllocateDir and ReadDir if you
    2947                 :             :  * use this shortcut.)
    2948                 :             :  *
    2949                 :             :  * The pathname passed to AllocateDir must be passed to this routine too,
    2950                 :             :  * but it is only used for error reporting.
    2951                 :             :  */
    2952                 :             : struct dirent *
    2953                 :       40996 : ReadDir(DIR *dir, const char *dirname)
    2954                 :             : {
    2955                 :       40996 :         return ReadDirExtended(dir, dirname, ERROR);
    2956                 :             : }
    2957                 :             : 
    2958                 :             : /*
    2959                 :             :  * Alternate version of ReadDir that allows caller to specify the elevel
    2960                 :             :  * for any error report (whether it's reporting an initial failure of
    2961                 :             :  * AllocateDir or a subsequent directory read failure).
    2962                 :             :  *
    2963                 :             :  * If elevel < ERROR, returns NULL after any error.  With the normal coding
    2964                 :             :  * pattern, this will result in falling out of the loop immediately as
    2965                 :             :  * though the directory contained no (more) entries.
    2966                 :             :  */
    2967                 :             : struct dirent *
    2968                 :      137538 : ReadDirExtended(DIR *dir, const char *dirname, int elevel)
    2969                 :             : {
    2970                 :      137538 :         struct dirent *dent;
    2971                 :             : 
    2972                 :             :         /* Give a generic message for AllocateDir failure, if caller didn't */
    2973         [ +  + ]:      137538 :         if (dir == NULL)
    2974                 :             :         {
    2975   [ -  +  #  #  :           2 :                 ereport(elevel,
          +  +  #  #  #  
                      # ]
    2976                 :             :                                 (errcode_for_file_access(),
    2977                 :             :                                  errmsg("could not open directory \"%s\": %m",
    2978                 :             :                                                 dirname)));
    2979                 :           0 :                 return NULL;
    2980                 :             :         }
    2981                 :             : 
    2982                 :      137536 :         errno = 0;
    2983         [ +  + ]:      137536 :         if ((dent = readdir(dir)) != NULL)
    2984                 :      137181 :                 return dent;
    2985                 :             : 
    2986         [ +  - ]:         355 :         if (errno)
    2987   [ #  #  #  #  :           0 :                 ereport(elevel,
          #  #  #  #  #  
                      # ]
    2988                 :             :                                 (errcode_for_file_access(),
    2989                 :             :                                  errmsg("could not read directory \"%s\": %m",
    2990                 :             :                                                 dirname)));
    2991                 :         355 :         return NULL;
    2992                 :      137536 : }
    2993                 :             : 
    2994                 :             : /*
    2995                 :             :  * Close a directory opened with AllocateDir.
    2996                 :             :  *
    2997                 :             :  * Returns closedir's return value (with errno set if it's not 0).
    2998                 :             :  * Note we do not check the return value --- it is up to the caller
    2999                 :             :  * to handle close errors if wanted.
    3000                 :             :  *
    3001                 :             :  * Does nothing if dir == NULL; we assume that directory open failure was
    3002                 :             :  * already reported if desired.
    3003                 :             :  */
    3004                 :             : int
    3005                 :        2028 : FreeDir(DIR *dir)
    3006                 :             : {
    3007                 :        2028 :         int                     i;
    3008                 :             : 
    3009                 :             :         /* Nothing to do if AllocateDir failed */
    3010         [ +  - ]:        2028 :         if (dir == NULL)
    3011                 :           0 :                 return 0;
    3012                 :             : 
    3013                 :             :         DO_DB(elog(LOG, "FreeDir: Allocated %d", numAllocatedDescs));
    3014                 :             : 
    3015                 :             :         /* Remove dir from list of allocated dirs, if it's present */
    3016         [ +  - ]:        2028 :         for (i = numAllocatedDescs; --i >= 0;)
    3017                 :             :         {
    3018                 :        2028 :                 AllocateDesc *desc = &allocatedDescs[i];
    3019                 :             : 
    3020   [ +  -  -  + ]:        2028 :                 if (desc->kind == AllocateDescDir && desc->desc.dir == dir)
    3021                 :        2028 :                         return FreeDesc(desc);
    3022         [ +  - ]:        2028 :         }
    3023                 :             : 
    3024                 :             :         /* Only get here if someone passes us a dir not in allocatedDescs */
    3025   [ #  #  #  # ]:           0 :         elog(WARNING, "dir passed to FreeDir was not obtained from AllocateDir");
    3026                 :             : 
    3027                 :           0 :         return closedir(dir);
    3028                 :        2028 : }
    3029                 :             : 
    3030                 :             : 
    3031                 :             : /*
    3032                 :             :  * Close a pipe stream returned by OpenPipeStream.
    3033                 :             :  */
    3034                 :             : int
    3035                 :           1 : ClosePipeStream(FILE *file)
    3036                 :             : {
    3037                 :           1 :         int                     i;
    3038                 :             : 
    3039                 :             :         DO_DB(elog(LOG, "ClosePipeStream: Allocated %d", numAllocatedDescs));
    3040                 :             : 
    3041                 :             :         /* Remove file from list of allocated files, if it's present */
    3042         [ +  - ]:           1 :         for (i = numAllocatedDescs; --i >= 0;)
    3043                 :             :         {
    3044                 :           1 :                 AllocateDesc *desc = &allocatedDescs[i];
    3045                 :             : 
    3046   [ +  -  -  + ]:           1 :                 if (desc->kind == AllocateDescPipe && desc->desc.file == file)
    3047                 :           1 :                         return FreeDesc(desc);
    3048         [ +  - ]:           1 :         }
    3049                 :             : 
    3050                 :             :         /* Only get here if someone passes us a file not in allocatedDescs */
    3051   [ #  #  #  # ]:           0 :         elog(WARNING, "file passed to ClosePipeStream was not obtained from OpenPipeStream");
    3052                 :             : 
    3053                 :           0 :         return pclose(file);
    3054                 :           1 : }
    3055                 :             : 
    3056                 :             : /*
    3057                 :             :  * closeAllVfds
    3058                 :             :  *
    3059                 :             :  * Force all VFDs into the physically-closed state, so that the fewest
    3060                 :             :  * possible number of kernel file descriptors are in use.  There is no
    3061                 :             :  * change in the logical state of the VFDs.
    3062                 :             :  */
    3063                 :             : void
    3064                 :          10 : closeAllVfds(void)
    3065                 :             : {
    3066                 :          10 :         Index           i;
    3067                 :             : 
    3068         [ -  + ]:          10 :         if (SizeVfdCache > 0)
    3069                 :             :         {
    3070         [ +  - ]:          10 :                 Assert(FileIsNotOpen(0));       /* Make sure ring not corrupted */
    3071         [ +  + ]:         320 :                 for (i = 1; i < SizeVfdCache; i++)
    3072                 :             :                 {
    3073         [ +  + ]:         310 :                         if (!FileIsNotOpen(i))
    3074                 :          22 :                                 LruDelete(i);
    3075                 :         310 :                 }
    3076                 :          10 :         }
    3077                 :          10 : }
    3078                 :             : 
    3079                 :             : 
    3080                 :             : /*
    3081                 :             :  * SetTempTablespaces
    3082                 :             :  *
    3083                 :             :  * Define a list (actually an array) of OIDs of tablespaces to use for
    3084                 :             :  * temporary files.  This list will be used until end of transaction,
    3085                 :             :  * unless this function is called again before then.  It is caller's
    3086                 :             :  * responsibility that the passed-in array has adequate lifespan (typically
    3087                 :             :  * it'd be allocated in TopTransactionContext).
    3088                 :             :  *
    3089                 :             :  * Some entries of the array may be InvalidOid, indicating that the current
    3090                 :             :  * database's default tablespace should be used.
    3091                 :             :  */
    3092                 :             : void
    3093                 :         674 : SetTempTablespaces(Oid *tableSpaces, int numSpaces)
    3094                 :             : {
    3095         [ +  - ]:         674 :         Assert(numSpaces >= 0);
    3096                 :         674 :         tempTableSpaces = tableSpaces;
    3097                 :         674 :         numTempTableSpaces = numSpaces;
    3098                 :             : 
    3099                 :             :         /*
    3100                 :             :          * Select a random starting point in the list.  This is to minimize
    3101                 :             :          * conflicts between backends that are most likely sharing the same list
    3102                 :             :          * of temp tablespaces.  Note that if we create multiple temp files in the
    3103                 :             :          * same transaction, we'll advance circularly through the list --- this
    3104                 :             :          * ensures that large temporary sort files are nicely spread across all
    3105                 :             :          * available tablespaces.
    3106                 :             :          */
    3107         [ -  + ]:         674 :         if (numSpaces > 1)
    3108                 :           0 :                 nextTempTableSpace = pg_prng_uint64_range(&pg_global_prng_state,
    3109                 :           0 :                                                                                                   0, numSpaces - 1);
    3110                 :             :         else
    3111                 :         674 :                 nextTempTableSpace = 0;
    3112                 :         674 : }
    3113                 :             : 
    3114                 :             : /*
    3115                 :             :  * TempTablespacesAreSet
    3116                 :             :  *
    3117                 :             :  * Returns true if SetTempTablespaces has been called in current transaction.
    3118                 :             :  * (This is just so that tablespaces.c doesn't need its own per-transaction
    3119                 :             :  * state.)
    3120                 :             :  */
    3121                 :             : bool
    3122                 :        1516 : TempTablespacesAreSet(void)
    3123                 :             : {
    3124                 :        1516 :         return (numTempTableSpaces >= 0);
    3125                 :             : }
    3126                 :             : 
    3127                 :             : /*
    3128                 :             :  * GetTempTablespaces
    3129                 :             :  *
    3130                 :             :  * Populate an array with the OIDs of the tablespaces that should be used for
    3131                 :             :  * temporary files.  (Some entries may be InvalidOid, indicating that the
    3132                 :             :  * current database's default tablespace should be used.)  At most numSpaces
    3133                 :             :  * entries will be filled.
    3134                 :             :  * Returns the number of OIDs that were copied into the output array.
    3135                 :             :  */
    3136                 :             : int
    3137                 :          62 : GetTempTablespaces(Oid *tableSpaces, int numSpaces)
    3138                 :             : {
    3139                 :          62 :         int                     i;
    3140                 :             : 
    3141         [ +  - ]:          62 :         Assert(TempTablespacesAreSet());
    3142   [ +  -  -  + ]:          62 :         for (i = 0; i < numTempTableSpaces && i < numSpaces; ++i)
    3143                 :           0 :                 tableSpaces[i] = tempTableSpaces[i];
    3144                 :             : 
    3145                 :         124 :         return i;
    3146                 :          62 : }
    3147                 :             : 
    3148                 :             : /*
    3149                 :             :  * GetNextTempTableSpace
    3150                 :             :  *
    3151                 :             :  * Select the next temp tablespace to use.  A result of InvalidOid means
    3152                 :             :  * to use the current database's default tablespace.
    3153                 :             :  */
    3154                 :             : Oid
    3155                 :         699 : GetNextTempTableSpace(void)
    3156                 :             : {
    3157         [ -  + ]:         699 :         if (numTempTableSpaces > 0)
    3158                 :             :         {
    3159                 :             :                 /* Advance nextTempTableSpace counter with wraparound */
    3160         [ #  # ]:           0 :                 if (++nextTempTableSpace >= numTempTableSpaces)
    3161                 :           0 :                         nextTempTableSpace = 0;
    3162                 :           0 :                 return tempTableSpaces[nextTempTableSpace];
    3163                 :             :         }
    3164                 :         699 :         return InvalidOid;
    3165                 :         699 : }
    3166                 :             : 
    3167                 :             : 
    3168                 :             : /*
    3169                 :             :  * AtEOSubXact_Files
    3170                 :             :  *
    3171                 :             :  * Take care of subtransaction commit/abort.  At abort, we close AllocateDescs
    3172                 :             :  * that the subtransaction may have opened.  At commit, we reassign them to
    3173                 :             :  * the parent subtransaction.  (Temporary files are tracked by ResourceOwners
    3174                 :             :  * instead.)
    3175                 :             :  */
    3176                 :             : void
    3177                 :        1665 : AtEOSubXact_Files(bool isCommit, SubTransactionId mySubid,
    3178                 :             :                                   SubTransactionId parentSubid)
    3179                 :             : {
    3180                 :        1665 :         Index           i;
    3181                 :             : 
    3182         [ -  + ]:        1665 :         for (i = 0; i < numAllocatedDescs; i++)
    3183                 :             :         {
    3184         [ #  # ]:           0 :                 if (allocatedDescs[i].create_subid == mySubid)
    3185                 :             :                 {
    3186         [ #  # ]:           0 :                         if (isCommit)
    3187                 :           0 :                                 allocatedDescs[i].create_subid = parentSubid;
    3188                 :             :                         else
    3189                 :             :                         {
    3190                 :             :                                 /* have to recheck the item after FreeDesc (ugly) */
    3191                 :           0 :                                 FreeDesc(&allocatedDescs[i--]);
    3192                 :             :                         }
    3193                 :           0 :                 }
    3194                 :           0 :         }
    3195                 :        1665 : }
    3196                 :             : 
    3197                 :             : /*
    3198                 :             :  * AtEOXact_Files
    3199                 :             :  *
    3200                 :             :  * This routine is called during transaction commit or abort.  All still-open
    3201                 :             :  * per-transaction temporary file VFDs are closed, which also causes the
    3202                 :             :  * underlying files to be deleted (although they should've been closed already
    3203                 :             :  * by the ResourceOwner cleanup). Furthermore, all "allocated" stdio files are
    3204                 :             :  * closed. We also forget any transaction-local temp tablespace list.
    3205                 :             :  *
    3206                 :             :  * The isCommit flag is used only to decide whether to emit warnings about
    3207                 :             :  * unclosed files.
    3208                 :             :  */
    3209                 :             : void
    3210                 :       57914 : AtEOXact_Files(bool isCommit)
    3211                 :             : {
    3212                 :       57914 :         CleanupTempFiles(isCommit, false);
    3213                 :       57914 :         tempTableSpaces = NULL;
    3214                 :       57914 :         numTempTableSpaces = -1;
    3215                 :       57914 : }
    3216                 :             : 
    3217                 :             : /*
    3218                 :             :  * BeforeShmemExit_Files
    3219                 :             :  *
    3220                 :             :  * before_shmem_exit hook to clean up temp files during backend shutdown.
    3221                 :             :  * Here, we want to clean up *all* temp files including interXact ones.
    3222                 :             :  */
    3223                 :             : static void
    3224                 :         806 : BeforeShmemExit_Files(int code, Datum arg)
    3225                 :             : {
    3226                 :         806 :         CleanupTempFiles(false, true);
    3227                 :             : 
    3228                 :             :         /* prevent further temp files from being created */
    3229                 :             : #ifdef USE_ASSERT_CHECKING
    3230                 :         806 :         temporary_files_allowed = false;
    3231                 :             : #endif
    3232                 :         806 : }
    3233                 :             : 
    3234                 :             : /*
    3235                 :             :  * Close temporary files and delete their underlying files.
    3236                 :             :  *
    3237                 :             :  * isCommit: if true, this is normal transaction commit, and we don't
    3238                 :             :  * expect any remaining files; warn if there are some.
    3239                 :             :  *
    3240                 :             :  * isProcExit: if true, this is being called as the backend process is
    3241                 :             :  * exiting. If that's the case, we should remove all temporary files; if
    3242                 :             :  * that's not the case, we are being called for transaction commit/abort
    3243                 :             :  * and should only remove transaction-local temp files.  In either case,
    3244                 :             :  * also clean up "allocated" stdio files, dirs and fds.
    3245                 :             :  */
    3246                 :             : static void
    3247                 :       58720 : CleanupTempFiles(bool isCommit, bool isProcExit)
    3248                 :             : {
    3249                 :       58720 :         Index           i;
    3250                 :             : 
    3251                 :             :         /*
    3252                 :             :          * Careful here: at proc_exit we need extra cleanup, not just
    3253                 :             :          * xact_temporary files.
    3254                 :             :          */
    3255   [ +  +  +  + ]:       58720 :         if (isProcExit || have_xact_temporary_files)
    3256                 :             :         {
    3257         [ +  - ]:         952 :                 Assert(FileIsNotOpen(0));       /* Make sure ring not corrupted */
    3258         [ +  + ]:       46309 :                 for (i = 1; i < SizeVfdCache; i++)
    3259                 :             :                 {
    3260                 :       45357 :                         unsigned short fdstate = VfdCache[i].fdstate;
    3261                 :             : 
    3262   [ +  -  +  - ]:       45357 :                         if (((fdstate & FD_DELETE_AT_CLOSE) || (fdstate & FD_CLOSE_AT_EOXACT)) &&
    3263                 :       45357 :                                 VfdCache[i].fileName != NULL)
    3264                 :             :                         {
    3265                 :             :                                 /*
    3266                 :             :                                  * If we're in the process of exiting a backend process, close
    3267                 :             :                                  * all temporary files. Otherwise, only close temporary files
    3268                 :             :                                  * local to the current transaction. They should be closed by
    3269                 :             :                                  * the ResourceOwner mechanism already, so this is just a
    3270                 :             :                                  * debugging cross-check.
    3271                 :             :                                  */
    3272         [ #  # ]:           0 :                                 if (isProcExit)
    3273                 :           0 :                                         FileClose(i);
    3274         [ #  # ]:           0 :                                 else if (fdstate & FD_CLOSE_AT_EOXACT)
    3275                 :             :                                 {
    3276   [ #  #  #  # ]:           0 :                                         elog(WARNING,
    3277                 :             :                                                  "temporary file %s not closed at end-of-transaction",
    3278                 :             :                                                  VfdCache[i].fileName);
    3279                 :           0 :                                         FileClose(i);
    3280                 :           0 :                                 }
    3281                 :           0 :                         }
    3282                 :       45357 :                 }
    3283                 :             : 
    3284                 :         952 :                 have_xact_temporary_files = false;
    3285                 :         952 :         }
    3286                 :             : 
    3287                 :             :         /* Complain if any allocated files remain open at commit. */
    3288   [ +  +  +  - ]:       58720 :         if (isCommit && numAllocatedDescs > 0)
    3289   [ #  #  #  # ]:           0 :                 elog(WARNING, "%d temporary files and directories not closed at end-of-transaction",
    3290                 :             :                          numAllocatedDescs);
    3291                 :             : 
    3292                 :             :         /* Clean up "allocated" stdio files, dirs and fds. */
    3293         [ +  + ]:       58724 :         while (numAllocatedDescs > 0)
    3294                 :           4 :                 FreeDesc(&allocatedDescs[0]);
    3295                 :       58720 : }
    3296                 :             : 
    3297                 :             : 
    3298                 :             : /*
    3299                 :             :  * Remove temporary and temporary relation files left over from a prior
    3300                 :             :  * postmaster session
    3301                 :             :  *
    3302                 :             :  * This should be called during postmaster startup.  It will forcibly
    3303                 :             :  * remove any leftover files created by OpenTemporaryFile and any leftover
    3304                 :             :  * temporary relation files created by mdcreate.
    3305                 :             :  *
    3306                 :             :  * During post-backend-crash restart cycle, this routine is called when
    3307                 :             :  * remove_temp_files_after_crash GUC is enabled. Multiple crashes while
    3308                 :             :  * queries are using temp files could result in useless storage usage that can
    3309                 :             :  * only be reclaimed by a service restart. The argument against enabling it is
    3310                 :             :  * that someone might want to examine the temporary files for debugging
    3311                 :             :  * purposes. This does however mean that OpenTemporaryFile had better allow for
    3312                 :             :  * collision with an existing temp file name.
    3313                 :             :  *
    3314                 :             :  * NOTE: this function and its subroutines generally report syscall failures
    3315                 :             :  * with ereport(LOG) and keep going.  Removing temp files is not so critical
    3316                 :             :  * that we should fail to start the database when we can't do it.
    3317                 :             :  */
    3318                 :             : void
    3319                 :           2 : RemovePgTempFiles(void)
    3320                 :             : {
    3321                 :           2 :         char            temp_path[MAXPGPATH + sizeof(PG_TBLSPC_DIR) + sizeof(TABLESPACE_VERSION_DIRECTORY) + sizeof(PG_TEMP_FILES_DIR)];
    3322                 :           2 :         DIR                *spc_dir;
    3323                 :           2 :         struct dirent *spc_de;
    3324                 :             : 
    3325                 :             :         /*
    3326                 :             :          * First process temp files in pg_default ($PGDATA/base)
    3327                 :             :          */
    3328                 :           2 :         snprintf(temp_path, sizeof(temp_path), "base/%s", PG_TEMP_FILES_DIR);
    3329                 :           2 :         RemovePgTempFilesInDir(temp_path, true, false);
    3330                 :           2 :         RemovePgTempRelationFiles("base");
    3331                 :             : 
    3332                 :             :         /*
    3333                 :             :          * Cycle through temp directories for all non-default tablespaces.
    3334                 :             :          */
    3335                 :           2 :         spc_dir = AllocateDir(PG_TBLSPC_DIR);
    3336                 :             : 
    3337         [ +  + ]:           6 :         while ((spc_de = ReadDirExtended(spc_dir, PG_TBLSPC_DIR, LOG)) != NULL)
    3338                 :             :         {
    3339   [ +  +  +  - ]:           4 :                 if (strcmp(spc_de->d_name, ".") == 0 ||
    3340                 :           2 :                         strcmp(spc_de->d_name, "..") == 0)
    3341                 :           4 :                         continue;
    3342                 :             : 
    3343                 :           0 :                 snprintf(temp_path, sizeof(temp_path), "%s/%s/%s/%s",
    3344                 :           0 :                                  PG_TBLSPC_DIR, spc_de->d_name, TABLESPACE_VERSION_DIRECTORY,
    3345                 :             :                                  PG_TEMP_FILES_DIR);
    3346                 :           0 :                 RemovePgTempFilesInDir(temp_path, true, false);
    3347                 :             : 
    3348                 :           0 :                 snprintf(temp_path, sizeof(temp_path), "%s/%s/%s",
    3349                 :           0 :                                  PG_TBLSPC_DIR, spc_de->d_name, TABLESPACE_VERSION_DIRECTORY);
    3350                 :           0 :                 RemovePgTempRelationFiles(temp_path);
    3351                 :             :         }
    3352                 :             : 
    3353                 :           2 :         FreeDir(spc_dir);
    3354                 :             : 
    3355                 :             :         /*
    3356                 :             :          * In EXEC_BACKEND case there is a pgsql_tmp directory at the top level of
    3357                 :             :          * DataDir as well.  However, that is *not* cleaned here because doing so
    3358                 :             :          * would create a race condition.  It's done separately, earlier in
    3359                 :             :          * postmaster startup.
    3360                 :             :          */
    3361                 :           2 : }
    3362                 :             : 
    3363                 :             : /*
    3364                 :             :  * Process one pgsql_tmp directory for RemovePgTempFiles.
    3365                 :             :  *
    3366                 :             :  * If missing_ok is true, it's all right for the named directory to not exist.
    3367                 :             :  * Any other problem results in a LOG message.  (missing_ok should be true at
    3368                 :             :  * the top level, since pgsql_tmp directories are not created until needed.)
    3369                 :             :  *
    3370                 :             :  * At the top level, this should be called with unlink_all = false, so that
    3371                 :             :  * only files matching the temporary name prefix will be unlinked.  When
    3372                 :             :  * recursing it will be called with unlink_all = true to unlink everything
    3373                 :             :  * under a top-level temporary directory.
    3374                 :             :  *
    3375                 :             :  * (These two flags could be replaced by one, but it seems clearer to keep
    3376                 :             :  * them separate.)
    3377                 :             :  */
    3378                 :             : void
    3379                 :           2 : RemovePgTempFilesInDir(const char *tmpdirname, bool missing_ok, bool unlink_all)
    3380                 :             : {
    3381                 :           2 :         DIR                *temp_dir;
    3382                 :           2 :         struct dirent *temp_de;
    3383                 :           2 :         char            rm_path[MAXPGPATH * 2];
    3384                 :             : 
    3385                 :           2 :         temp_dir = AllocateDir(tmpdirname);
    3386                 :             : 
    3387   [ +  -  +  -  :           2 :         if (temp_dir == NULL && errno == ENOENT && missing_ok)
                   -  + ]
    3388                 :           2 :                 return;
    3389                 :             : 
    3390         [ #  # ]:           0 :         while ((temp_de = ReadDirExtended(temp_dir, tmpdirname, LOG)) != NULL)
    3391                 :             :         {
    3392   [ #  #  #  # ]:           0 :                 if (strcmp(temp_de->d_name, ".") == 0 ||
    3393                 :           0 :                         strcmp(temp_de->d_name, "..") == 0)
    3394                 :           0 :                         continue;
    3395                 :             : 
    3396                 :           0 :                 snprintf(rm_path, sizeof(rm_path), "%s/%s",
    3397                 :           0 :                                  tmpdirname, temp_de->d_name);
    3398                 :             : 
    3399   [ #  #  #  # ]:           0 :                 if (unlink_all ||
    3400                 :           0 :                         strncmp(temp_de->d_name,
    3401                 :             :                                         PG_TEMP_FILE_PREFIX,
    3402                 :           0 :                                         strlen(PG_TEMP_FILE_PREFIX)) == 0)
    3403                 :             :                 {
    3404                 :           0 :                         PGFileType      type = get_dirent_type(rm_path, temp_de, false, LOG);
    3405                 :             : 
    3406         [ #  # ]:           0 :                         if (type == PGFILETYPE_ERROR)
    3407                 :           0 :                                 continue;
    3408         [ #  # ]:           0 :                         else if (type == PGFILETYPE_DIR)
    3409                 :             :                         {
    3410                 :             :                                 /* recursively remove contents, then directory itself */
    3411                 :           0 :                                 RemovePgTempFilesInDir(rm_path, false, true);
    3412                 :             : 
    3413         [ #  # ]:           0 :                                 if (rmdir(rm_path) < 0)
    3414   [ #  #  #  # ]:           0 :                                         ereport(LOG,
    3415                 :             :                                                         (errcode_for_file_access(),
    3416                 :             :                                                          errmsg("could not remove directory \"%s\": %m",
    3417                 :             :                                                                         rm_path)));
    3418                 :           0 :                         }
    3419                 :             :                         else
    3420                 :             :                         {
    3421         [ #  # ]:           0 :                                 if (unlink(rm_path) < 0)
    3422   [ #  #  #  # ]:           0 :                                         ereport(LOG,
    3423                 :             :                                                         (errcode_for_file_access(),
    3424                 :             :                                                          errmsg("could not remove file \"%s\": %m",
    3425                 :             :                                                                         rm_path)));
    3426                 :             :                         }
    3427         [ #  # ]:           0 :                 }
    3428                 :             :                 else
    3429   [ #  #  #  # ]:           0 :                         ereport(LOG,
    3430                 :             :                                         (errmsg("unexpected file found in temporary-files directory: \"%s\"",
    3431                 :             :                                                         rm_path)));
    3432                 :             :         }
    3433                 :             : 
    3434                 :           0 :         FreeDir(temp_dir);
    3435                 :           2 : }
    3436                 :             : 
    3437                 :             : /* Process one tablespace directory, look for per-DB subdirectories */
    3438                 :             : static void
    3439                 :           2 : RemovePgTempRelationFiles(const char *tsdirname)
    3440                 :             : {
    3441                 :           2 :         DIR                *ts_dir;
    3442                 :           2 :         struct dirent *de;
    3443                 :           2 :         char            dbspace_path[MAXPGPATH * 2];
    3444                 :             : 
    3445                 :           2 :         ts_dir = AllocateDir(tsdirname);
    3446                 :             : 
    3447         [ +  + ]:          12 :         while ((de = ReadDirExtended(ts_dir, tsdirname, LOG)) != NULL)
    3448                 :             :         {
    3449                 :             :                 /*
    3450                 :             :                  * We're only interested in the per-database directories, which have
    3451                 :             :                  * numeric names.  Note that this code will also (properly) ignore "."
    3452                 :             :                  * and "..".
    3453                 :             :                  */
    3454         [ +  + ]:          10 :                 if (strspn(de->d_name, "0123456789") != strlen(de->d_name))
    3455                 :           4 :                         continue;
    3456                 :             : 
    3457                 :          12 :                 snprintf(dbspace_path, sizeof(dbspace_path), "%s/%s",
    3458                 :           6 :                                  tsdirname, de->d_name);
    3459                 :           6 :                 RemovePgTempRelationFilesInDbspace(dbspace_path);
    3460                 :             :         }
    3461                 :             : 
    3462                 :           2 :         FreeDir(ts_dir);
    3463                 :           2 : }
    3464                 :             : 
    3465                 :             : /* Process one per-dbspace directory for RemovePgTempRelationFiles */
    3466                 :             : static void
    3467                 :           6 : RemovePgTempRelationFilesInDbspace(const char *dbspacedirname)
    3468                 :             : {
    3469                 :           6 :         DIR                *dbspace_dir;
    3470                 :           6 :         struct dirent *de;
    3471                 :           6 :         char            rm_path[MAXPGPATH * 2];
    3472                 :             : 
    3473                 :           6 :         dbspace_dir = AllocateDir(dbspacedirname);
    3474                 :             : 
    3475         [ +  + ]:        1818 :         while ((de = ReadDirExtended(dbspace_dir, dbspacedirname, LOG)) != NULL)
    3476                 :             :         {
    3477         [ -  + ]:        1812 :                 if (!looks_like_temp_rel_name(de->d_name))
    3478                 :        1812 :                         continue;
    3479                 :             : 
    3480                 :           0 :                 snprintf(rm_path, sizeof(rm_path), "%s/%s",
    3481                 :           0 :                                  dbspacedirname, de->d_name);
    3482                 :             : 
    3483         [ #  # ]:           0 :                 if (unlink(rm_path) < 0)
    3484   [ #  #  #  # ]:           0 :                         ereport(LOG,
    3485                 :             :                                         (errcode_for_file_access(),
    3486                 :             :                                          errmsg("could not remove file \"%s\": %m",
    3487                 :             :                                                         rm_path)));
    3488                 :             :         }
    3489                 :             : 
    3490                 :           6 :         FreeDir(dbspace_dir);
    3491                 :           6 : }
    3492                 :             : 
    3493                 :             : /* t<digits>_<digits>, or t<digits>_<digits>_<forkname> */
    3494                 :             : bool
    3495                 :        1812 : looks_like_temp_rel_name(const char *name)
    3496                 :             : {
    3497                 :        1812 :         int                     pos;
    3498                 :        1812 :         int                     savepos;
    3499                 :             : 
    3500                 :             :         /* Must start with "t". */
    3501         [ +  - ]:        1812 :         if (name[0] != 't')
    3502                 :        1812 :                 return false;
    3503                 :             : 
    3504                 :             :         /* Followed by a non-empty string of digits and then an underscore. */
    3505         [ #  # ]:           0 :         for (pos = 1; isdigit((unsigned char) name[pos]); ++pos)
    3506                 :             :                 ;
    3507   [ #  #  #  # ]:           0 :         if (pos == 1 || name[pos] != '_')
    3508                 :           0 :                 return false;
    3509                 :             : 
    3510                 :             :         /* Followed by another nonempty string of digits. */
    3511         [ #  # ]:           0 :         for (savepos = ++pos; isdigit((unsigned char) name[pos]); ++pos)
    3512                 :             :                 ;
    3513         [ #  # ]:           0 :         if (savepos == pos)
    3514                 :           0 :                 return false;
    3515                 :             : 
    3516                 :             :         /* We might have _forkname or .segment or both. */
    3517         [ #  # ]:           0 :         if (name[pos] == '_')
    3518                 :             :         {
    3519                 :           0 :                 int                     forkchar = forkname_chars(&name[pos + 1], NULL);
    3520                 :             : 
    3521         [ #  # ]:           0 :                 if (forkchar <= 0)
    3522                 :           0 :                         return false;
    3523                 :           0 :                 pos += forkchar + 1;
    3524         [ #  # ]:           0 :         }
    3525         [ #  # ]:           0 :         if (name[pos] == '.')
    3526                 :             :         {
    3527                 :           0 :                 int                     segchar;
    3528                 :             : 
    3529         [ #  # ]:           0 :                 for (segchar = 1; isdigit((unsigned char) name[pos + segchar]); ++segchar)
    3530                 :             :                         ;
    3531         [ #  # ]:           0 :                 if (segchar <= 1)
    3532                 :           0 :                         return false;
    3533                 :           0 :                 pos += segchar;
    3534         [ #  # ]:           0 :         }
    3535                 :             : 
    3536                 :             :         /* Now we should be at the end. */
    3537         [ #  # ]:           0 :         if (name[pos] != '\0')
    3538                 :           0 :                 return false;
    3539                 :           0 :         return true;
    3540                 :        1812 : }
    3541                 :             : 
    3542                 :             : #ifdef HAVE_SYNCFS
    3543                 :             : static void
    3544                 :             : do_syncfs(const char *path)
    3545                 :             : {
    3546                 :             :         int                     fd;
    3547                 :             : 
    3548                 :             :         ereport_startup_progress("syncing data directory (syncfs), elapsed time: %ld.%02d s, current path: %s",
    3549                 :             :                                                          path);
    3550                 :             : 
    3551                 :             :         fd = OpenTransientFile(path, O_RDONLY);
    3552                 :             :         if (fd < 0)
    3553                 :             :         {
    3554                 :             :                 ereport(LOG,
    3555                 :             :                                 (errcode_for_file_access(),
    3556                 :             :                                  errmsg("could not open file \"%s\": %m", path)));
    3557                 :             :                 return;
    3558                 :             :         }
    3559                 :             :         if (syncfs(fd) < 0)
    3560                 :             :                 ereport(LOG,
    3561                 :             :                                 (errcode_for_file_access(),
    3562                 :             :                                  errmsg("could not synchronize file system for file \"%s\": %m", path)));
    3563                 :             :         CloseTransientFile(fd);
    3564                 :             : }
    3565                 :             : #endif
    3566                 :             : 
    3567                 :             : /*
    3568                 :             :  * Issue fsync recursively on PGDATA and all its contents, or issue syncfs for
    3569                 :             :  * all potential filesystem, depending on recovery_init_sync_method setting.
    3570                 :             :  *
    3571                 :             :  * We fsync regular files and directories wherever they are, but we
    3572                 :             :  * follow symlinks only for pg_wal and immediately under pg_tblspc.
    3573                 :             :  * Other symlinks are presumed to point at files we're not responsible
    3574                 :             :  * for fsyncing, and might not have privileges to write at all.
    3575                 :             :  *
    3576                 :             :  * Errors are logged but not considered fatal; that's because this is used
    3577                 :             :  * only during database startup, to deal with the possibility that there are
    3578                 :             :  * issued-but-unsynced writes pending against the data directory.  We want to
    3579                 :             :  * ensure that such writes reach disk before anything that's done in the new
    3580                 :             :  * run.  However, aborting on error would result in failure to start for
    3581                 :             :  * harmless cases such as read-only files in the data directory, and that's
    3582                 :             :  * not good either.
    3583                 :             :  *
    3584                 :             :  * Note that if we previously crashed due to a PANIC on fsync(), we'll be
    3585                 :             :  * rewriting all changes again during recovery.
    3586                 :             :  *
    3587                 :             :  * Note we assume we're chdir'd into PGDATA to begin with.
    3588                 :             :  */
    3589                 :             : void
    3590                 :           0 : SyncDataDirectory(void)
    3591                 :             : {
    3592                 :           0 :         bool            xlog_is_symlink;
    3593                 :             : 
    3594                 :             :         /* We can skip this whole thing if fsync is disabled. */
    3595         [ #  # ]:           0 :         if (!enableFsync)
    3596                 :           0 :                 return;
    3597                 :             : 
    3598                 :             :         /*
    3599                 :             :          * If pg_wal is a symlink, we'll need to recurse into it separately,
    3600                 :             :          * because the first walkdir below will ignore it.
    3601                 :             :          */
    3602                 :           0 :         xlog_is_symlink = false;
    3603                 :             : 
    3604                 :             :         {
    3605                 :           0 :                 struct stat st;
    3606                 :             : 
    3607         [ #  # ]:           0 :                 if (lstat("pg_wal", &st) < 0)
    3608   [ #  #  #  # ]:           0 :                         ereport(LOG,
    3609                 :             :                                         (errcode_for_file_access(),
    3610                 :             :                                          errmsg("could not stat file \"%s\": %m",
    3611                 :             :                                                         "pg_wal")));
    3612         [ #  # ]:           0 :                 else if (S_ISLNK(st.st_mode))
    3613                 :           0 :                         xlog_is_symlink = true;
    3614                 :           0 :         }
    3615                 :             : 
    3616                 :             : #ifdef HAVE_SYNCFS
    3617                 :             :         if (recovery_init_sync_method == DATA_DIR_SYNC_METHOD_SYNCFS)
    3618                 :             :         {
    3619                 :             :                 DIR                *dir;
    3620                 :             :                 struct dirent *de;
    3621                 :             : 
    3622                 :             :                 /*
    3623                 :             :                  * On Linux, we don't have to open every single file one by one.  We
    3624                 :             :                  * can use syncfs() to sync whole filesystems.  We only expect
    3625                 :             :                  * filesystem boundaries to exist where we tolerate symlinks, namely
    3626                 :             :                  * pg_wal and the tablespaces, so we call syncfs() for each of those
    3627                 :             :                  * directories.
    3628                 :             :                  */
    3629                 :             : 
    3630                 :             :                 /* Prepare to report progress syncing the data directory via syncfs. */
    3631                 :             :                 begin_startup_progress_phase();
    3632                 :             : 
    3633                 :             :                 /* Sync the top level pgdata directory. */
    3634                 :             :                 do_syncfs(".");
    3635                 :             :                 /* If any tablespaces are configured, sync each of those. */
    3636                 :             :                 dir = AllocateDir(PG_TBLSPC_DIR);
    3637                 :             :                 while ((de = ReadDirExtended(dir, PG_TBLSPC_DIR, LOG)))
    3638                 :             :                 {
    3639                 :             :                         char            path[MAXPGPATH];
    3640                 :             : 
    3641                 :             :                         if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
    3642                 :             :                                 continue;
    3643                 :             : 
    3644                 :             :                         snprintf(path, MAXPGPATH, "%s/%s", PG_TBLSPC_DIR, de->d_name);
    3645                 :             :                         do_syncfs(path);
    3646                 :             :                 }
    3647                 :             :                 FreeDir(dir);
    3648                 :             :                 /* If pg_wal is a symlink, process that too. */
    3649                 :             :                 if (xlog_is_symlink)
    3650                 :             :                         do_syncfs("pg_wal");
    3651                 :             :                 return;
    3652                 :             :         }
    3653                 :             : #endif                                                  /* !HAVE_SYNCFS */
    3654                 :             : 
    3655                 :             : #ifdef PG_FLUSH_DATA_WORKS
    3656                 :             :         /* Prepare to report progress of the pre-fsync phase. */
    3657                 :           0 :         begin_startup_progress_phase();
    3658                 :             : 
    3659                 :             :         /*
    3660                 :             :          * If possible, hint to the kernel that we're soon going to fsync the data
    3661                 :             :          * directory and its contents.  Errors in this step are even less
    3662                 :             :          * interesting than normal, so log them only at DEBUG1.
    3663                 :             :          */
    3664                 :           0 :         walkdir(".", pre_sync_fname, false, DEBUG1);
    3665         [ #  # ]:           0 :         if (xlog_is_symlink)
    3666                 :           0 :                 walkdir("pg_wal", pre_sync_fname, false, DEBUG1);
    3667                 :           0 :         walkdir(PG_TBLSPC_DIR, pre_sync_fname, true, DEBUG1);
    3668                 :             : #endif
    3669                 :             : 
    3670                 :             :         /* Prepare to report progress syncing the data directory via fsync. */
    3671                 :           0 :         begin_startup_progress_phase();
    3672                 :             : 
    3673                 :             :         /*
    3674                 :             :          * Now we do the fsync()s in the same order.
    3675                 :             :          *
    3676                 :             :          * The main call ignores symlinks, so in addition to specially processing
    3677                 :             :          * pg_wal if it's a symlink, pg_tblspc has to be visited separately with
    3678                 :             :          * process_symlinks = true.  Note that if there are any plain directories
    3679                 :             :          * in pg_tblspc, they'll get fsync'd twice.  That's not an expected case
    3680                 :             :          * so we don't worry about optimizing it.
    3681                 :             :          */
    3682                 :           0 :         walkdir(".", datadir_fsync_fname, false, LOG);
    3683         [ #  # ]:           0 :         if (xlog_is_symlink)
    3684                 :           0 :                 walkdir("pg_wal", datadir_fsync_fname, false, LOG);
    3685                 :           0 :         walkdir(PG_TBLSPC_DIR, datadir_fsync_fname, true, LOG);
    3686         [ #  # ]:           0 : }
    3687                 :             : 
    3688                 :             : /*
    3689                 :             :  * walkdir: recursively walk a directory, applying the action to each
    3690                 :             :  * regular file and directory (including the named directory itself).
    3691                 :             :  *
    3692                 :             :  * If process_symlinks is true, the action and recursion are also applied
    3693                 :             :  * to regular files and directories that are pointed to by symlinks in the
    3694                 :             :  * given directory; otherwise symlinks are ignored.  Symlinks are always
    3695                 :             :  * ignored in subdirectories, ie we intentionally don't pass down the
    3696                 :             :  * process_symlinks flag to recursive calls.
    3697                 :             :  *
    3698                 :             :  * Errors are reported at level elevel, which might be ERROR or less.
    3699                 :             :  *
    3700                 :             :  * See also walkdir in file_utils.c, which is a frontend version of this
    3701                 :             :  * logic.
    3702                 :             :  */
    3703                 :             : static void
    3704                 :          56 : walkdir(const char *path,
    3705                 :             :                 void (*action) (const char *fname, bool isdir, int elevel),
    3706                 :             :                 bool process_symlinks,
    3707                 :             :                 int elevel)
    3708                 :             : {
    3709                 :          56 :         DIR                *dir;
    3710                 :          56 :         struct dirent *de;
    3711                 :             : 
    3712                 :          56 :         dir = AllocateDir(path);
    3713                 :             : 
    3714         [ +  + ]:         616 :         while ((de = ReadDirExtended(dir, path, elevel)) != NULL)
    3715                 :             :         {
    3716                 :         560 :                 char            subpath[MAXPGPATH * 2];
    3717                 :             : 
    3718         [ +  - ]:         560 :                 CHECK_FOR_INTERRUPTS();
    3719                 :             : 
    3720   [ +  +  +  + ]:         560 :                 if (strcmp(de->d_name, ".") == 0 ||
    3721                 :         504 :                         strcmp(de->d_name, "..") == 0)
    3722                 :         112 :                         continue;
    3723                 :             : 
    3724                 :         448 :                 snprintf(subpath, sizeof(subpath), "%s/%s", path, de->d_name);
    3725                 :             : 
    3726      [ -  +  - ]:         448 :                 switch (get_dirent_type(subpath, de, process_symlinks, elevel))
    3727                 :             :                 {
    3728                 :             :                         case PGFILETYPE_REG:
    3729                 :         448 :                                 (*action) (subpath, false, elevel);
    3730                 :         448 :                                 break;
    3731                 :             :                         case PGFILETYPE_DIR:
    3732                 :           0 :                                 walkdir(subpath, action, false, elevel);
    3733                 :           0 :                                 break;
    3734                 :             :                         default:
    3735                 :             : 
    3736                 :             :                                 /*
    3737                 :             :                                  * Errors are already reported directly by get_dirent_type(),
    3738                 :             :                                  * and any remaining symlinks and unknown file types are
    3739                 :             :                                  * ignored.
    3740                 :             :                                  */
    3741                 :           0 :                                 break;
    3742                 :             :                 }
    3743      [ -  +  + ]:         560 :         }
    3744                 :             : 
    3745                 :          56 :         FreeDir(dir);                           /* we ignore any error here */
    3746                 :             : 
    3747                 :             :         /*
    3748                 :             :          * It's important to fsync the destination directory itself as individual
    3749                 :             :          * file fsyncs don't guarantee that the directory entry for the file is
    3750                 :             :          * synced.  However, skip this if AllocateDir failed; the action function
    3751                 :             :          * might not be robust against that.
    3752                 :             :          */
    3753         [ -  + ]:          56 :         if (dir)
    3754                 :          56 :                 (*action) (path, true, elevel);
    3755                 :          56 : }
    3756                 :             : 
    3757                 :             : 
    3758                 :             : /*
    3759                 :             :  * Hint to the OS that it should get ready to fsync() this file.
    3760                 :             :  *
    3761                 :             :  * Ignores errors trying to open unreadable files, and logs other errors at a
    3762                 :             :  * caller-specified level.
    3763                 :             :  */
    3764                 :             : #ifdef PG_FLUSH_DATA_WORKS
    3765                 :             : 
    3766                 :             : static void
    3767                 :           0 : pre_sync_fname(const char *fname, bool isdir, int elevel)
    3768                 :             : {
    3769                 :           0 :         int                     fd;
    3770                 :             : 
    3771                 :             :         /* Don't try to flush directories, it'll likely just fail */
    3772         [ #  # ]:           0 :         if (isdir)
    3773                 :           0 :                 return;
    3774                 :             : 
    3775   [ #  #  #  #  :           0 :         ereport_startup_progress("syncing data directory (pre-fsync), elapsed time: %ld.%02d s, current path: %s",
                   #  # ]
    3776                 :             :                                                          fname);
    3777                 :             : 
    3778                 :           0 :         fd = OpenTransientFile(fname, O_RDONLY | PG_BINARY);
    3779                 :             : 
    3780         [ #  # ]:           0 :         if (fd < 0)
    3781                 :             :         {
    3782         [ #  # ]:           0 :                 if (errno == EACCES)
    3783                 :           0 :                         return;
    3784   [ #  #  #  #  :           0 :                 ereport(elevel,
          #  #  #  #  #  
                      # ]
    3785                 :             :                                 (errcode_for_file_access(),
    3786                 :             :                                  errmsg("could not open file \"%s\": %m", fname)));
    3787                 :           0 :                 return;
    3788                 :             :         }
    3789                 :             : 
    3790                 :             :         /*
    3791                 :             :          * pg_flush_data() ignores errors, which is ok because this is only a
    3792                 :             :          * hint.
    3793                 :             :          */
    3794                 :           0 :         pg_flush_data(fd, 0, 0);
    3795                 :             : 
    3796         [ #  # ]:           0 :         if (CloseTransientFile(fd) != 0)
    3797   [ #  #  #  #  :           0 :                 ereport(elevel,
          #  #  #  #  #  
                      # ]
    3798                 :             :                                 (errcode_for_file_access(),
    3799                 :             :                                  errmsg("could not close file \"%s\": %m", fname)));
    3800         [ #  # ]:           0 : }
    3801                 :             : 
    3802                 :             : #endif                                                  /* PG_FLUSH_DATA_WORKS */
    3803                 :             : 
    3804                 :             : static void
    3805                 :           0 : datadir_fsync_fname(const char *fname, bool isdir, int elevel)
    3806                 :             : {
    3807   [ #  #  #  #  :           0 :         ereport_startup_progress("syncing data directory (fsync), elapsed time: %ld.%02d s, current path: %s",
                   #  # ]
    3808                 :             :                                                          fname);
    3809                 :             : 
    3810                 :             :         /*
    3811                 :             :          * We want to silently ignoring errors about unreadable files.  Pass that
    3812                 :             :          * desire on to fsync_fname_ext().
    3813                 :             :          */
    3814                 :           0 :         fsync_fname_ext(fname, isdir, true, elevel);
    3815                 :           0 : }
    3816                 :             : 
    3817                 :             : static void
    3818                 :         504 : unlink_if_exists_fname(const char *fname, bool isdir, int elevel)
    3819                 :             : {
    3820         [ +  + ]:         504 :         if (isdir)
    3821                 :             :         {
    3822   [ -  +  #  # ]:          56 :                 if (rmdir(fname) != 0 && errno != ENOENT)
    3823   [ #  #  #  #  :           0 :                         ereport(elevel,
          #  #  #  #  #  
                      # ]
    3824                 :             :                                         (errcode_for_file_access(),
    3825                 :             :                                          errmsg("could not remove directory \"%s\": %m", fname)));
    3826                 :          56 :         }
    3827                 :             :         else
    3828                 :             :         {
    3829                 :             :                 /* Use PathNameDeleteTemporaryFile to report filesize */
    3830                 :         448 :                 PathNameDeleteTemporaryFile(fname, false);
    3831                 :             :         }
    3832                 :         504 : }
    3833                 :             : 
    3834                 :             : /*
    3835                 :             :  * fsync_fname_ext -- Try to fsync a file or directory
    3836                 :             :  *
    3837                 :             :  * If ignore_perm is true, ignore errors upon trying to open unreadable
    3838                 :             :  * files. Logs other errors at a caller-specified level.
    3839                 :             :  *
    3840                 :             :  * Returns 0 if the operation succeeded, -1 otherwise.
    3841                 :             :  */
    3842                 :             : int
    3843                 :         141 : fsync_fname_ext(const char *fname, bool isdir, bool ignore_perm, int elevel)
    3844                 :             : {
    3845                 :         141 :         int                     fd;
    3846                 :         141 :         int                     flags;
    3847                 :         141 :         int                     returncode;
    3848                 :             : 
    3849                 :             :         /*
    3850                 :             :          * Some OSs require directories to be opened read-only whereas other
    3851                 :             :          * systems don't allow us to fsync files opened read-only; so we need both
    3852                 :             :          * cases here.  Using O_RDWR will cause us to fail to fsync files that are
    3853                 :             :          * not writable by our userid, but we assume that's OK.
    3854                 :             :          */
    3855                 :         141 :         flags = PG_BINARY;
    3856         [ +  + ]:         141 :         if (!isdir)
    3857                 :          68 :                 flags |= O_RDWR;
    3858                 :             :         else
    3859                 :          73 :                 flags |= O_RDONLY;
    3860                 :             : 
    3861                 :         141 :         fd = OpenTransientFile(fname, flags);
    3862                 :             : 
    3863                 :             :         /*
    3864                 :             :          * Some OSs don't allow us to open directories at all (Windows returns
    3865                 :             :          * EACCES), just ignore the error in that case.  If desired also silently
    3866                 :             :          * ignoring errors about unreadable files. Log others.
    3867                 :             :          */
    3868   [ -  +  #  #  :         141 :         if (fd < 0 && isdir && (errno == EISDIR || errno == EACCES))
             #  #  #  # ]
    3869                 :           0 :                 return 0;
    3870   [ -  +  #  #  :         141 :         else if (fd < 0 && ignore_perm && errno == EACCES)
                   #  # ]
    3871                 :           0 :                 return 0;
    3872         [ +  - ]:         141 :         else if (fd < 0)
    3873                 :             :         {
    3874   [ #  #  #  #  :           0 :                 ereport(elevel,
          #  #  #  #  #  
                      # ]
    3875                 :             :                                 (errcode_for_file_access(),
    3876                 :             :                                  errmsg("could not open file \"%s\": %m", fname)));
    3877                 :           0 :                 return -1;
    3878                 :             :         }
    3879                 :             : 
    3880                 :         141 :         returncode = pg_fsync(fd);
    3881                 :             : 
    3882                 :             :         /*
    3883                 :             :          * Some OSes don't allow us to fsync directories at all, so we can ignore
    3884                 :             :          * those errors. Anything else needs to be logged.
    3885                 :             :          */
    3886   [ -  +  #  #  :         141 :         if (returncode != 0 && !(isdir && (errno == EBADF || errno == EINVAL)))
             #  #  #  # ]
    3887                 :             :         {
    3888                 :           0 :                 int                     save_errno;
    3889                 :             : 
    3890                 :             :                 /* close file upon error, might not be in transaction context */
    3891                 :           0 :                 save_errno = errno;
    3892                 :           0 :                 (void) CloseTransientFile(fd);
    3893                 :           0 :                 errno = save_errno;
    3894                 :             : 
    3895   [ #  #  #  #  :           0 :                 ereport(elevel,
          #  #  #  #  #  
                      # ]
    3896                 :             :                                 (errcode_for_file_access(),
    3897                 :             :                                  errmsg("could not fsync file \"%s\": %m", fname)));
    3898                 :           0 :                 return -1;
    3899                 :           0 :         }
    3900                 :             : 
    3901         [ -  + ]:         141 :         if (CloseTransientFile(fd) != 0)
    3902                 :             :         {
    3903   [ #  #  #  #  :           0 :                 ereport(elevel,
          #  #  #  #  #  
                      # ]
    3904                 :             :                                 (errcode_for_file_access(),
    3905                 :             :                                  errmsg("could not close file \"%s\": %m", fname)));
    3906                 :           0 :                 return -1;
    3907                 :             :         }
    3908                 :             : 
    3909                 :         141 :         return 0;
    3910                 :         141 : }
    3911                 :             : 
    3912                 :             : /*
    3913                 :             :  * fsync_parent_path -- fsync the parent path of a file or directory
    3914                 :             :  *
    3915                 :             :  * This is aimed at making file operations persistent on disk in case of
    3916                 :             :  * an OS crash or power failure.
    3917                 :             :  */
    3918                 :             : static int
    3919                 :          34 : fsync_parent_path(const char *fname, int elevel)
    3920                 :             : {
    3921                 :          34 :         char            parentpath[MAXPGPATH];
    3922                 :             : 
    3923                 :          34 :         strlcpy(parentpath, fname, MAXPGPATH);
    3924                 :          34 :         get_parent_directory(parentpath);
    3925                 :             : 
    3926                 :             :         /*
    3927                 :             :          * get_parent_directory() returns an empty string if the input argument is
    3928                 :             :          * just a file name (see comments in path.c), so handle that as being the
    3929                 :             :          * current directory.
    3930                 :             :          */
    3931         [ +  - ]:          34 :         if (strlen(parentpath) == 0)
    3932                 :           0 :                 strlcpy(parentpath, ".", MAXPGPATH);
    3933                 :             : 
    3934         [ -  + ]:          34 :         if (fsync_fname_ext(parentpath, true, false, elevel) != 0)
    3935                 :           0 :                 return -1;
    3936                 :             : 
    3937                 :          34 :         return 0;
    3938                 :          34 : }
    3939                 :             : 
    3940                 :             : /*
    3941                 :             :  * Create a PostgreSQL data sub-directory
    3942                 :             :  *
    3943                 :             :  * The data directory itself, and most of its sub-directories, are created at
    3944                 :             :  * initdb time, but we do have some occasions when we create directories in
    3945                 :             :  * the backend (CREATE TABLESPACE, for example).  In those cases, we want to
    3946                 :             :  * make sure that those directories are created consistently.  Today, that means
    3947                 :             :  * making sure that the created directory has the correct permissions, which is
    3948                 :             :  * what pg_dir_create_mode tracks for us.
    3949                 :             :  *
    3950                 :             :  * Note that we also set the umask() based on what we understand the correct
    3951                 :             :  * permissions to be (see file_perm.c).
    3952                 :             :  *
    3953                 :             :  * For permissions other than the default, mkdir() can be used directly, but
    3954                 :             :  * be sure to consider carefully such cases -- a sub-directory with incorrect
    3955                 :             :  * permissions in a PostgreSQL data directory could cause backups and other
    3956                 :             :  * processes to fail.
    3957                 :             :  */
    3958                 :             : int
    3959                 :          79 : MakePGDirectory(const char *directoryName)
    3960                 :             : {
    3961                 :          79 :         return mkdir(directoryName, pg_dir_create_mode);
    3962                 :             : }
    3963                 :             : 
    3964                 :             : /*
    3965                 :             :  * Return the passed-in error level, or PANIC if data_sync_retry is off.
    3966                 :             :  *
    3967                 :             :  * Failure to fsync any data file is cause for immediate panic, unless
    3968                 :             :  * data_sync_retry is enabled.  Data may have been written to the operating
    3969                 :             :  * system and removed from our buffer pool already, and if we are running on
    3970                 :             :  * an operating system that forgets dirty data on write-back failure, there
    3971                 :             :  * may be only one copy of the data remaining: in the WAL.  A later attempt to
    3972                 :             :  * fsync again might falsely report success.  Therefore we must not allow any
    3973                 :             :  * further checkpoints to be attempted.  data_sync_retry can in theory be
    3974                 :             :  * enabled on systems known not to drop dirty buffered data on write-back
    3975                 :             :  * failure (with the likely outcome that checkpoints will continue to fail
    3976                 :             :  * until the underlying problem is fixed).
    3977                 :             :  *
    3978                 :             :  * Any code that reports a failure from fsync() or related functions should
    3979                 :             :  * filter the error level with this function.
    3980                 :             :  */
    3981                 :             : int
    3982                 :          39 : data_sync_elevel(int elevel)
    3983                 :             : {
    3984         [ -  + ]:          39 :         return data_sync_retry ? elevel : PANIC;
    3985                 :             : }
    3986                 :             : 
    3987                 :             : bool
    3988                 :           6 : check_debug_io_direct(char **newval, void **extra, GucSource source)
    3989                 :             : {
    3990                 :           6 :         bool            result = true;
    3991                 :           6 :         int                     flags;
    3992                 :             : 
    3993                 :             : #if PG_O_DIRECT == 0
    3994                 :             :         if (strcmp(*newval, "") != 0)
    3995                 :             :         {
    3996                 :             :                 GUC_check_errdetail("\"%s\" is not supported on this platform.",
    3997                 :             :                                                         "debug_io_direct");
    3998                 :             :                 result = false;
    3999                 :             :         }
    4000                 :             :         flags = 0;
    4001                 :             : #else
    4002                 :           6 :         List       *elemlist;
    4003                 :           6 :         ListCell   *l;
    4004                 :           6 :         char       *rawstring;
    4005                 :             : 
    4006                 :             :         /* Need a modifiable copy of string */
    4007                 :           6 :         rawstring = pstrdup(*newval);
    4008                 :             : 
    4009         [ -  + ]:           6 :         if (!SplitGUCList(rawstring, ',', &elemlist))
    4010                 :             :         {
    4011                 :           0 :                 GUC_check_errdetail("Invalid list syntax in parameter \"%s\".",
    4012                 :             :                                                         "debug_io_direct");
    4013                 :           0 :                 pfree(rawstring);
    4014                 :           0 :                 list_free(elemlist);
    4015                 :           0 :                 return false;
    4016                 :             :         }
    4017                 :             : 
    4018                 :           6 :         flags = 0;
    4019   [ -  +  #  #  :           6 :         foreach(l, elemlist)
                   -  + ]
    4020                 :             :         {
    4021                 :           0 :                 char       *item = (char *) lfirst(l);
    4022                 :             : 
    4023         [ #  # ]:           0 :                 if (pg_strcasecmp(item, "data") == 0)
    4024                 :           0 :                         flags |= IO_DIRECT_DATA;
    4025         [ #  # ]:           0 :                 else if (pg_strcasecmp(item, "wal") == 0)
    4026                 :           0 :                         flags |= IO_DIRECT_WAL;
    4027         [ #  # ]:           0 :                 else if (pg_strcasecmp(item, "wal_init") == 0)
    4028                 :           0 :                         flags |= IO_DIRECT_WAL_INIT;
    4029                 :             :                 else
    4030                 :             :                 {
    4031                 :           0 :                         GUC_check_errdetail("Invalid option \"%s\".", item);
    4032                 :           0 :                         result = false;
    4033                 :           0 :                         break;
    4034                 :             :                 }
    4035         [ #  # ]:           0 :         }
    4036                 :             : 
    4037                 :             :         /*
    4038                 :             :          * It's possible to configure block sizes smaller than our assumed I/O
    4039                 :             :          * alignment size, which could result in invalid I/O requests.
    4040                 :             :          */
    4041                 :             : #if XLOG_BLCKSZ < PG_IO_ALIGN_SIZE
    4042                 :             :         if (result && (flags & (IO_DIRECT_WAL | IO_DIRECT_WAL_INIT)))
    4043                 :             :         {
    4044                 :             :                 GUC_check_errdetail("\"%s\" is not supported for WAL because %s is too small.",
    4045                 :             :                                                         "debug_io_direct", "XLOG_BLCKSZ");
    4046                 :             :                 result = false;
    4047                 :             :         }
    4048                 :             : #endif
    4049                 :             : #if BLCKSZ < PG_IO_ALIGN_SIZE
    4050                 :             :         if (result && (flags & IO_DIRECT_DATA))
    4051                 :             :         {
    4052                 :             :                 GUC_check_errdetail("\"%s\" is not supported for data because %s is too small.",
    4053                 :             :                                                         "debug_io_direct", "BLCKSZ");
    4054                 :             :                 result = false;
    4055                 :             :         }
    4056                 :             : #endif
    4057                 :             : 
    4058                 :           6 :         pfree(rawstring);
    4059                 :           6 :         list_free(elemlist);
    4060                 :             : #endif
    4061                 :             : 
    4062         [ +  - ]:           6 :         if (!result)
    4063                 :           0 :                 return result;
    4064                 :             : 
    4065                 :             :         /* Save the flags in *extra, for use by assign_debug_io_direct */
    4066                 :           6 :         *extra = guc_malloc(LOG, sizeof(int));
    4067         [ +  - ]:           6 :         if (!*extra)
    4068                 :           0 :                 return false;
    4069                 :           6 :         *((int *) *extra) = flags;
    4070                 :             : 
    4071                 :           6 :         return result;
    4072                 :           6 : }
    4073                 :             : 
    4074                 :             : void
    4075                 :           6 : assign_debug_io_direct(const char *newval, void *extra)
    4076                 :             : {
    4077                 :           6 :         int                *flags = (int *) extra;
    4078                 :             : 
    4079                 :           6 :         io_direct_flags = *flags;
    4080                 :           6 : }
    4081                 :             : 
    4082                 :             : /* ResourceOwner callbacks */
    4083                 :             : 
    4084                 :             : static void
    4085                 :           0 : ResOwnerReleaseFile(Datum res)
    4086                 :             : {
    4087                 :           0 :         File            file = (File) DatumGetInt32(res);
    4088                 :           0 :         Vfd                *vfdP;
    4089                 :             : 
    4090         [ #  # ]:           0 :         Assert(FileIsValid(file));
    4091                 :             : 
    4092                 :           0 :         vfdP = &VfdCache[file];
    4093                 :           0 :         vfdP->resowner = NULL;
    4094                 :             : 
    4095                 :           0 :         FileClose(file);
    4096                 :           0 : }
    4097                 :             : 
    4098                 :             : static char *
    4099                 :           0 : ResOwnerPrintFile(Datum res)
    4100                 :             : {
    4101                 :           0 :         return psprintf("File %d", DatumGetInt32(res));
    4102                 :             : }
        

Generated by: LCOV version 2.3.2-1